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Abstract 

Although  the  simplex  method’s  performance  in  solving  linear  program¬ 
ming  problems  is  usually  quite  good,  it  does  not  guarantee  strict  improve¬ 
ment  at  each  iteration  on  degenerate  problems.  Instead  of  trying  to  recog¬ 
nize  and  avoid  degenerate  steps  in  the  simplex  method  (as  some  variants 
do),  we  have  developed  a  new  Phase  I  algorithm  that  is  completely  imper¬ 
vious  to  degeneracy,  with  strict  improvement  attained  at  each  iteration. 
It  is  also  noted  that  the  new  Phase  I  algorithm  is  closely  related  to  a 
number  of  existing  algorithms. 

When  tested  on  the  30  smallest  NETLIB  linear  programming  test 
problems,  the  computational  results  for  the  new  Phase  I  algorithm  were 
almost  3.5  times  faster  than  the  simplex  method;  on  some  problems,  it 
was  over  10  times  faster. 
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On  highly  degenerate  problems,  the  simplex  method  often  “stalls”,  performing 
a  number  of  iterations  at  a  degenerate  point  before  producing  any  improvement 
in  the  objective  value.  Examples  have  been  constructed  by  Hoffman  [11]  and 
Beale  [1]  to  show  that  it  is  theoretically  possible  that  the  iterative  steps  can 
repeat  2uid  thus  cycle  forever,  although  this  phenomenon  is  quite  rare  in  prac¬ 
tice.  Instead  of  trying  to  make  the  simplex  method  more  efficient  by  trying  to 
avoid  stalls  due  to  degeneracy  (or  near  degeneracy),  we  develop  a  new  Phase  I 
algorithm  that  is  completely  impervious  to  degeneracy. 

This  new  method  involves  the  use  of  Ie2»st-squares  subproblems  in  column 
selection,  and  is  shown  to  have  the  property  of  strict  improvement  at  each  iter¬ 
ation,  even  if  every  basic  solution  (in  the  simplex  method  sense)  is  degenerate. 
Like  the  simplex  method,  we  will  show  that  the  new  Ph^^se  I  algorithm  termi¬ 
nates  in  a  finite  number  of  steps,  and  that  in  practice,  this  number  is  often  quite 
low  compared  to  the  simplex  method. 

Our  algorithm  is  quite  similar  to  a  number  of  existing  algorithms,  including 
one  to  solve  the  bounded  least-squares  problem  (found  in  [3]),  and  another  to 
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solve  the  non-negative  least-squares  problem  (found  in  [12]).  In  addition,  our 
algorithm  is  also  closely  related  the  algorithm  developed  by  Dantzig  [5]  and  by 
Van  de  Panne  ic,  Whinston  [14],  R.W.  Cottle  noted  that  although  this  algorithm 
was  designed  to  solve  quadratic  programs  and  thus  has  a  different  goal  in  mind, 
when  applied  to  an  alternate  formulation  of  the  Phase  I  feasibility  problem,  it 
is  similar  to  a  variant  of  our  least-squares  Phase  I  algorithm. 

Section  2  develops  most  of  the  theory  used  by  the  algorithms  presented.  It 
is  concerned  with  finding  a  feasible  solution  to  a  linear  program,  and  after  some 
preliminary  results,  begins  to  look  at  two-variable  least-squares  problems.  The 
two-variable  problem  and  its  positive  least-squares  solution  is  used  to  select  the 
incoming  column,  and  thus  build  an  improved  “basic”  solution.  Conditions  are 
derived  under  which  the  solutions  to  these  least-squares  problems  are  strictly 
positive.  Although  this  is  analogous  in  many  ways  to  the  simplex  method,  it 
is  proved  that  strict  improvement  can  be  guaranteed  at  each  iteration,  even 
in  the  presence  of  degeneracy.  Next,  the  detection  of  infeasibility  is  discussed, 
followed  by  a  detailed  description  of  the  newly  developed  least-squares  Phase 

1  algorithm,  and  discussion  of  a  number  of  variations.  Finally,  equivalences 
between  our  algorithms  and  the  other  related  algorithms  are  discussed. 

Section  3  presents  computational  results  for  the  algorithm  developed  in  Sec¬ 
tion  2.  As  noted,  the  least-squares  Phase  I  algorithm  has  excellent  performance, 
with  run  times  3.5  times  faster  than  LSSOL’s  implementation  of  the  simplex 
method  [8].  On  some  problems,  the  least-squares  Phase  1  algorithm  was  over  10 
times  faster. 

Section  4  summarizes  the  work  presented  here  and  attempts  to  draw  some 
conclusions.  Suggestions  for  future  work  appear  at  the  end  of  this  section. 

2  Finding  a  Feasible  Solution 

2.1  The  Problem 

The  problem  to  be  addressed  is  that  of  the  Phase  I  problem  solved  by  the 
simplex  method.  That  is,  find  a  vector  x  such  that 

Ax  =  b,  (2-1) 

I  >  0, 

where  x  e  38",  6  €  S?",  and  A  €  38'"’^".  Denoting  the  jth  column  of  Ahy  Aj,  we 

assume  Aj  ^  0  V  j,  and  that  6  ^  0,  or  else  x  =  0  is  a  trivial  solution.  Without 

loss  of  generality,  we  edso  assume  that  6  has  been  rescaled  so  that  ||6||2  =  1, 
and  similarly  the  columns  of  A  have  been  rescaled  so  that  ||  Aj  Hj  =  1  V  j. 

Fact  2.1 

Let  B  be  an  indcptndenl  subset  of  the  columns  of  A,  and  let  xb  satisfy 
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Bxb  =  6, 

Xg  >  0. 


K2-2) 


Then  by  setting  Xj  —  0  for  Aj  ^  B,  it  is  a  trivial  matter  to  construct  an  x*  from 
xb  such  that  x*  is  a  solution  to  (2-1).  (See  [6]  for  a  more  detailed  treatment  of 
the  relationship  between  (2-1)  and  (2-2).) 

The  main  objective  of  the  least-squares  Phase  1  algorithm  to  be  presented 
here  is  to  find  such  a  matrix  B  (if  it  exists),  and  positive  weightings  xg,  such 
that  B  and  xg  comprise  a  solution  to  (2-2).  In  the  remainder  of  this  discussion, 
we  will  refer  to  this  matrix  S  as  a  basis.  Note  that  this  notion  of  a  basis 
differs  from  that  of  the  simplex  method,  as  our  basis  B  may  contain  less  than 
m  columns  and  that  these  columns  may  not  be  sufficient  in  themselves  to  span 
the  colunui  space  of  A. 

2.2  Preliminaries 

We  need  to  consider  least-squares  problems  with  positive  solutions  before  we  can 
present  the  algorithm.  But  before  we  can  consider  such  problems,  we  need  a 
few  more  preliminary  facts  and  results.  We  will  be  making  use  of  the  Euclidean 
norm,  which  we  will  denote  by  HU. 

Fact  2.2 

Let  b,v,p  €  5R”*,  b  ^  0,  p  ^  0,  v  ^  0,  and  v  ^  b.  Let  u  =  b  —  v.  If  \  •  v  is  closer 
to  b  than  any  other  scalar  multiple  A,  then  we  can  conclude  that 


(a) 

T 

V^V 

= 

bn  >  0; 

(b) 

T 

U*V 

= 

0; 

(c) 

u^u 

= 

bn  >  0; 

(d) 

bn 

= 

un  +  v'^v; 

(e) 

ifpn 

< 

0,  then  1  •  v  is  closer  to  b  than  any  scalar  multiple  pp  of  p, 

>  0; 

(f) 

If  pn 

> 

0,  then  1  •  r  is  closer  to  b  than  pp  for  all  p  >  0  if  and 

only  if 

II  HI 

> 

p^’VIlptl; 

(g) 

//p# 

0 

and  V  ^  pp,  then  v^v  p^p  —  {v^pY  >  0. 

Proof: 

(a)  The  hypotheses  imply  that  A  =  1  is  the  solution  to 

m^in  II  ft  —  Av  Ip. 
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Therefore 


-^(6  —  Xv)^{b  —  Av)  =  0  for  A  =  1, 
dX 

and  v^v  =  b^v  follows.  We  also  have  b^v  >  0  since  v  ^  0. 

(b)  Since  «  =  6  -  v  by  definition,  u^v  =  (6  —  v^'v  =  0  by  (a). 

(c)  It  follows  that  u^u  =  «^(6  —  «)  =  u^b. 

(d)  We  have  b^b  =  6^(u  +  u)  =  u^u  +  v^v  by  (a)  and  (c). 

(e)  The  unconstrained  minimum  of  the  problem 

nun  II 6 -^p  II 

occurs  at  p*  =  p^b/p^p,  and  if  p^b  <  0,  we  have  p*  <  0.  Because  ||  b—pp  |p 
is  a  convex  function  of  p  with  a  minimum  at  p*,  we  can  conclude  if  p^b  <  0, 

min  II 6 -pp  II 
^>0 

must  occur  at  p  =  0.  Thus, 

nun||6-pp||  =  ||6||. 

lt>0 

Now  consider 

II 6  -  u  II  =  6^6  -  v^v  <  II 6 1|. 

If  p^b  <  0,  we  have 

||6-u||  <  ||6||  =min||6-pp||. 

^>0 

(f)  From  part  (e),  we  know  that  the  minimum  of  the  problem 

minll6-  ppII^ 

occurs  at  p*  =  p^b/p^p,  and  we  see  that  if  p^b  >  0,  we  have  p*  >  0.  Thus, 
min||6-pp|p  =  min||6-pp|p. 

ft  ^>0 

If  V  is  closer  to  6  than  is  pp  for  all  p  >  0,  then  we  have 
||6-u||^  <  nun||6-pp||2 

*4>0 

b^b  -  2v^b  +  v^v  <  min(6^6  —  2pp^b  +  p^p^p) 

lt>0 

—v^v  <  —  ^  ■  by  Fact  2.2,  part  (a). 

pip 
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Fact  2.5 

Let  {/Ij, y42, .  ■  • , -4*}  f>e  a  set  of  linearly  independent  columns.  Let  be  the 
unique  least-squares  solution  to  min  ||  i4,x,  ||,  and  let  Let 

r*  ^  x°  be  any  other  weighting  of  the  columns  Ai,  and  let  v*  =  ^iAixj.  Then 
II 6  —  (1  —  X)v^  —  Xv°  Ip  is  monotonically  increasing  from  A  =  1  A  =  0. 

Proof: 

II 6  -  (1  -  A)t;'  -  Xv°  IP  =  II 6  -  |p  -  2A(t  -  v'f{v°  -  t;’)  +  A^y  t;'  -  v°  |p 

is  a  strictly  convex  function,  and  its  minimum  occurs  at  A  =  1.  Thus  this 
quadratic  function  of  A  strictly  increases  from  A  =  1  to  A  =  0.  □ 

Corollary  2.6 

Let  V  =  (1  —  A)v*  +  Aw'’.  If  v°  is  closer  to  h  than  is  w’,  then  u^u  <  ufui,  where 
ui  =  6  —  w* ,  and  u  —  b  —  v  for  allO  <  X  <  1. 

Proof:  Let  uq  =  b—v^.  We  know  that  «o  ^  «i  because  «o«o  <  Therefore 

II  «i  -  «o  IP  =  «r«i  +  «o^“o  -  2uj’uo  >  0, 
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and  2uJuo  <  u^ui  +  uJuq.  Hence, 

vTu  =  ^Auo  +  (1  -  A)«ij  ^A«o  +  (1  -  A)ui^ 

=  A^ujiio  +  (1  —  A)^nj«i  -I-  2A(1  -  A)uftio 

<  A^ujuo  +  (1  -  +  A(1  -  A)(u^tii  +  UqUo) 

=  (1  -  A)«i’«i  +  AuqUo 

<  (1  —  A)«fui  +  \ujui  (because  ujuo  <  ) 

T 

=  «;«!• 

Thus  any  nontrivial  convex  combination  of  two  vectors  and  u'  as  shown  above 
is  strictly  closer  to  b  than  whichever  vector  or  w*  is  farthest  away  from  6.  □ 

Corollary  2.7 

Let  A  €  3?"*’^”  be  made  up  of  linearly  independent  columns,  and  let  x°  be  the 
unique  ieast-squares  solution  to  min  ||  6  —  ^4*  ||.  Let  v°  =  Ax°.  Assume  v°  ^  0. 
//u*  >  0,  where  u*  is  otherwise  arbitrary,  then  a  convex  combination  v  ofv°  and 
V*  can  be  found  to  generate  a  v  >  0  closer  to  b  than  v*  for  some  A,  0<  A<  1. 

Proof:  From  Fact  2.5,  we  know  that  (|6  -  (1  -  A)t;‘  -  Av®  |p  is  monotonically 
increasing  from  A  =  1  to  A  =  0.  Thus  any  convex  combination,  u  =  (1  -  \)v^  + 
Xv°,  has  the  property  that 

||6-t;M|^>||6-t;||2>|(6-t;°|p  (A,^0,A9^1) 

If  we  want  v  >  0  with  v  closer  to  6  than  w*,  start  with  A  =  1  and  decrease  A 
until  all  components  of  v  are  >  0.  This  is  possible  because  u*  >  0.  □ 

Fact  2.8 

Given  x  >  0,  y  0,  the  minimum  A  required  to  make  the  vector  Aj:+(1  — A)y  >  0 

IS 

».  -Vi 

A  =  max - . 

y.<0  Xi  -  yi 

This  concludes  the  necessary  preliminary  results. 

2.3  Least  Squares  with  Positive  Solutions 

In  this  section,  we  will  consider  the  conditions  under  which  the  solution  to  the 
(unconstrained)  two-variable  least-squares  problem 

min  F(o,/?)  =  min||6  —  av  —  ^pIP  (2-3) 

0,0 

is  unique,  and  the  implications  of  a  non-positive  solution.  Then  we  will  derive 
conditions  under  which  the  solution  is  strictly  positive. 


Theorem  2.9 

Let  p  ^  0,  t;  /  pp  and  u  =  b  —  v.  Lei  rain  F  =  Fq.  Then, 

(a)  the  values  o/(ao,/?o)  yielding  Fq  are  unique; 

(b)  Fq  =  «^u  if  and  only  if  (ao,/?o)  =  (1.0)- 
Proof: 

(a)  Noting  that  the  condition  v  ^  fip  implies  v  and  p  are  linearly  independent, 

this  follows  directly  from  Fact  2.4. 

(b)  If  (ao,Po)  =  (1,0),  then  we  see  that  Fo  =  vTu.  By  part  (a),  we  know  that 

the  values  of  (ao,/?o)  yielding  Fq  are  unique.  Therefore,  the  converse  also 
holds;  that  is,  Fq  =  u^u  =  ||6  —  1  v  —  0  p||^  implies  (ao,/?o)  =  (1,0).  □ 

Corollary  2.10 

let  1  •  t;  6c  closer  to  b  than  any  other  Xv.  If  Fq  ^  u^u,  then  Fq  <  u'^u  and 
(Lq  i-  0.  In  addition,  if  min{oo,/Jo}  <  0,  then  Fq  < 

Proof:  If  Fo  ^  u^u,  we  can  conclude  that  Fq  <  u^u,  as  F  =  u^u  is  obtainable 
at(a,^)  =  (l,0). 

Now  assume  that  Fo  <  u^u  and  0o  =  0.  Thus,  +  Pop  =  Since  v 
is  closer  to  6  than  is  any  Xv  (A  ^  1),  oqv  cannot  be  as  close  to  6  as  is  r  for 
ao  ^  1.  Therefore  if  /?o  =  0,  then  ao  =  1.  From  Theorem  2.9,  we  know  that  if 
(ao,Po)  =  (1,0),  then  F  =  u^u,  which  is  a  contradiction. 

In  Theorem  2.9,  we  showed  that  Fq  =  v^u  if  and  only  if  (ao,/?o)  =  (1>0)- 
We  just  showed  that  if  Fq  ^  u^u,  then  Fo  <  u^u.  If  min{ao,,0o}  <  0,  then 
(«o,/3o)  ^  (1,0).  Thus  we  can  conclude  that  Fo  ^  u^u,  which  in  turn  implies 
that  Fo  <  u^u.  □ 

Theorem  2.11 

Let  1  ■  V  be  closer  to  b  than  any  other  Xv.  Also  let  v  ^  pp  and  let  v  be  closer  to 
b  than  any  pp,  p  >  0.  Let  Fo  =  minF,  and  let  (ao,/?o)  be  the  unique  solution 
corresponding  to  Fq.  Let  vq  =  oqv  +  flop.  // minjao, /?o)  <  0,  then  there  exist 
no  (ai,Pi)  >  0  such  that  Vi  =  0|U  +  /^ip  is  closer  to  b  than  is  v. 

Proof:  Assume  3  {a\,0\)  >  0  such  that  is  closer  to  6  than  is  u,  i.e.,  ||  6— r  ||  > 
\\b-vi  ||. 

Case  1:  ao  <  0,  j^o  >  0. 

Let  t)  be  a  convex  combination  of  vq  and  wi : 

V  =  Xvo  +  (1  -  A)vi 

=  ^Aoo  +  (1  -  A)ai^t;+  (a/?o  +  (1  -  A),di^p 
=  av  -*  $p. 
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where  a  =  Aao  +  (1  —  A)oi  and  0  =  Xpo  +  (1  ~  ^)0i- 

Since  vq  is  formed  from  the  unique  solution  to  min  F,  we  know  that  vq  is 
closer  to  b  thM  is  vi.  Thus  by  Corollary  2.6,  we  can  say  that  any  convex 
combination  of  vq  and  W]  is  at  least  as  close  to  6  as  is  vj.  That  is,  ||6  —  ii  ||  < 
||6  —  vi  ||.  At  some  point  between  vq  and  wi,  a  =  0,  since  oq  <  0  and  Oj  >  0. 
At  such  a  point,  we  have  €  =  0p.  However,  we  know  that  u  is  closer  to  b  than 
any  0p  for  /3  >  0,  so  u  is  closer  to  6  than  is  v.  That  is,  ||  6  —  u  ||  <  ||  6  —  ti  |j.  We 
also  know  that  t;  is  at  least  as  close  to  6  as  is  V] ,  i.e.,  ||  6  —  {i  ||  <  ||  6  —  vi  ||.  Thus 
we  have  ||  6  —  u ||  <  || 6  —  ui  ||,  which  contradicts  the  assumption. 

Case  2:  0o  <  0,  no  conditions  on  oq. 

As  in  Case  1,  let  ii  be  a  convex  combination  of  vq  and  vi: 

ti  =  Auo  +  (l-A)t)i 
=  av  +  0p, 

where  a  and  0  are  as  defined  above  in  Case  1.  Also  as  in  Case  1,  we  can  conclude 
that  any  convex  combination  of  uq  and  vi  is  at  least  as  close  to  6  as  is  uj .  That 
is,  ||6-ti||  <  ||6-t;i  1|.  At  some  point  between  t»o  and  uj,  we  have  0=0.  At 
such  a  point,  v  =  av.  However,  we  know  that  v  is  closer  to  b  than  any  av  for 
a  ^  1.  So  u  is  at  least  as  close  to  6  as  is  5,  i.e.,  ||fe  — v||  <  ||6-  r||.  We  also 
know  that  ||6  -  ’*  ||  <  ||  6  —  uj  ||.  Thus  we  have  ||  6  —  v  ||  <  ||  fr  —  vi  ||,  which  again 
contradicts  the  assumption.  □ 


Now  we  will  derive  conditions  under  which  the  solution  (ao,0o)  to 

min  F  =  min  \\b  —  av  —  0p |p 

a.P 

is  strictly  positive. 

Theorem  2.12 

Let  V  ^  ftp,  p  5^  0.  Let  v  be  closer  to  b  than  ■'ny  Av,  A  ^  1  and  let  u  =  b  —  v. 
Let  F  and  (ao,0o)  be  as  defined  in  Theorem  2.11.  Then  0p  >  0  if  and  only  if 
p^u  >  0. 

Proof:  Solving  for  0o  at  the  minimum  of  F,  we  get 

_  b^p  v^v  —  b^v  p^v 
^  fFp  v^v  —  (p^v)2 

Consider  the  sign  of  0o-  From  Fact  2.2,  we  know  that  p^pv^v  —  (p^v)^  >  0,  so 
we  need  only  consider  the  numerator: 

p^b  v'^v  —  b^v  p^v  =  p^6  v^v  —  p'^v  v^v  (by  Fact  2.2,  part  (a)) 

T  T 

=  V  vp  u. 

Since  v  0,  we  know  that  v^v  >  0.  Thus  the  sign  of  p^u  determines  the  sign 
of  the  numerator  and  hence  the  sign  of  0o.  The  result  follows.  □ 
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Theorem  2.13 

Given  the  same  conditions  as  in  Theorem  2.12  with  the  addition  that  v  be  closer 
to  b  than  is  /ip  for  p  >  0,  then  ao  >  0  if  p^u  >  0. 

Proof;  Solving  for  oq  at  the  miaimum  of  F,  we  get 

v^v  p^p  —  (p^v)^  —  p^u  p^v 
p'^p  v'^v  —  (p^n)2 

From  Fact  2.2,  we  know  that  p^pv^v  —  (p^v)^  >  0,  so  the  sign  of  oq  is  the  same 
as  the  sign  of  the  numerator  v^v  p^p  —  (p^v)^  —  p^u  p^v 

Case  1:  p^b  >  0: 

From  Fact  2.2  we  k«ow  that  p^p  >  {p^b)^.  Therefore, 
v^v  p^p  —  (p^v)^  —  p^u  p^v  >  {p^b)^  —  {p^v)^  —  p^v  p^u 

=  (pV-(pV-P^t;p^6  +  (pV 

=  p^b(p^b  —  p^v) 

—  p^bp^u  >  0, 

since  p^b  >  0  and  p^u  >  0.  Thus  in  Case  I,  ao  >  0  if  p^u  >  0. 

Case  2:  p^b  <  0: 

We  know  p^u  >  0  and  thus  p^b  —  p^v  >  0.  B'lt  p^b  <  0.  so  p^v  must  be 
negative.  Again  we  consider  the  sign  of  the  numerator  of  on  By  Fact  2.2, 
we  know  v'^v  p^p  —  (p^v)^  >  0,  so  consider  — p^u  p^v.  However,  we  know  that 
p^u  >  0  and  p^v  <  0.  Thus  —p^up^v  >  0.  Thus  in  Case  2,  qq  >  0  if  p^u  >  0  □ 

Corollary  2.14 

Given  the  conditions  of  Theorem  2.13,  the  unconditional  minimum  of  F  is  at 
{ao,0o)  >0  if  u^p  >  0. 

Proof:  This  follows  directly  from  Theorems  2.12  and  2.1.3.  □ 

2.4  Column  Selection  Criteria 

We  have  considered  the  problem  (2-3),  and  conditions  under  which  (oq,  fin)  >0. 
So  far,  we  have  held  both  v  and  p  fixed.  We  now  consider  holding  only  v  fixed, 
but  allowing  p  to  be  chosen  as  one  of  the  columns  of  the  matrix  A  from  (2-1). 
In  particuleu”,  we  select  that  column  A,  whose  nonnegative  combination  with 
V  brings  us  closer  to  6  than  any  other  Aj,  j  ^  s.  This  is  uone  by  solving  the 
problem 

min  G  =  min  (  min  ||  6  —  av  —  PAj  .  (2-4) 

j  \  0,0  ) 
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Before  finding  the  minimum  of  G  in  the  general  case,  we  will  first  consider  the 
simpler  problem  of  finding  Aj  =  A,  that  yields  the  minimum  of  G  when  t)  =  0. 
The  solution  /?  --  /?o  will  be  used  to  set  v  =  fioA,  as  the  initial  approximation 
to  6. 

Theorem  2.15 

IfbT'Aj  >  0  for  some  j,  then  the  solution  to 

min  ^inf  II 6  - (2-5) 

IS  attained  at  /?o  =  b^A,  where 

s  =  argmax  b^Aj .  (2-6) 

i 

Otherwise,  tfb^Aj  <  0  for  all  j,  then  no  matter  which  j  is  chosen,  the  inf  of  (2- 
5)  is  attained  at  0  —  Q.  In  the  latter  case,  the  closest  nonnegative  approximation 
to  b  IS  given  by  x  =  0. 

Proof:  First  consider 


min  F  =  min  ||  6  —  0Aj  |p. 

0 


Because  the  columns  of  A  are  normalized,  the  minimum  of  F  occurs  at 


00 


b^Aj  _ 
AjAj 


=  b‘Ai 


Evaluating  f  at  /?o,  we  get 


F 


^0=00 


{b  -  0oAjf(b  -  0oAj) 

bn  -  (6%)2. 


In  choosing  a  particular  Aj,  we  want  to  minimize  this  value  subject  to  0o  >  0. 
This  corresponds  to  maximizing  {b^Aj)^  over  ail  positive  b^/ij.  If  b^Aj  <  0  for 
all  j,  then  the  inf  of  problem  (2-5)  is  clearly  attained  at  /?  =  0,  regardless  of  the 
choice  of  j.  □ 

We  set  V  =  b^AjA,  as  the  initial  approximation  to  b  (where  s  is  defined 
by  (2-6)),  and  we  proceed  to  seek  to  improve  this  approximation  by  considering 
the  problem  (2-4).  For  convenience  of  discussion,  we  assume  that  s  defined 
by  (2-6)  is  unique,  although  this  is  not  a  necessary  condition. 
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Theorem  2.16 

Let  \  v  he  closer  to  6  than  any  scalar  multiple  Xv  for  A  1.  Let  fiAj  ^  v  'ij, 
where  the  columns  Aj  are  normalized  as  in  (2-1).  Let  v  be  closer  to  b  than  fiAj 
for  /i  >  0  and  all  j,  and  let  u  =  b  —  v.  Under  these  conditions,  the  solution  to 

min  (  inf  116  —  at)  — |P  |  (2-7) 

is  attained  at  j  =  s  where 

AJu 

s  =  argmax  - - ^ ,  (2-8) 

^  ^t)^t)  —  (i4jt))2  j 

provided  that  3  j  :  Aju  >0.  If  3  no  j  such  that  Aju  >  0,  then  any  (a,  /I)  >  0 
will  produce  a  value  of  (2-7)  that  is  greater  than  or  equal  to  \\b  —  v  ||^. 

Proof:  First  consider 


min  Fj  =  min  ||  6  —  ov  —  ^Aj 

a,0 


(2-9) 


From  the  proofs  of  Theorem  2.12  and  Theorem  2.13,  we  know  that  the  uncon¬ 
ditional  minimum  of  Fj  occurs  at  {aj,0j)  where 

/  AjAj  v^v  —  AJv  Ajb  v^v  Aju  \ 

^  y  AjAj  v'^v  —  (Ajvp  ’  AJAj  v'^v  —  (Ajvy  J 

Since  AjAj  =  1,  this  can  be  rewritten  as 

y  v^v  —  (Ajvy  ’  v'^v  —  (Ajv)^  J 

Evaluating  Fj  at  the  minimum  (aj,jSj),  we  get 

Fj\a=a,  =  {b  -  OjV  -  0jAj)'’'{b  -  OjV  -  PjAj) 

0=0, 

=  6^(6  -  otjV  -  PjAj)  -  ajv'^{b-ajV  -  PjAj)  -  0jAj{b  -  OjV  -  0jAj) 
=  b'^{b  -  QjV  -  0jAj) 

=  6^6  —  Oj  v^v  —  0j  b^Aj  (from  Fact  2.2,  part  (a)) 


=  b‘ 


( (t)^v)^  -  v^v  b'^Aj  v^Aj  \  (  Aju  b'^Aj  \ 

\  v'^v  —  {v'^Ajy  )  y  u'^’i)  —  (u'^Aj)^  y 


=  6^6 


v'^v(^ 


v'^v  -  26%  v'^Aj  +  {Ajbf 
v'^v  —  {v^Aj  y 


) 
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We  choose  j4,  so  as  to  miniimze  Fj.  This  corresponds  to  maximizing  the 
following  over  Aj : 

( -  2b'’^Aj  v^Aj  +  (6'^74j)^\  _  (Af(b  -  v))^ 

\  v'^v  —  (v'^Aj)^  J  v'^v  —  {Ajvy 

which  is  the  same  as  choosing  j  =  s  so  as  to  maximize 

{Ajib-v)f  _  (Ajnf 
v'^v  —  (Ajvy  v'^v  —  (AJv)^ 

Let  us  consider  the  solution  (ao./?o)  to  (2-9)-  Let  minF  =  F{ao,l3o)  =  Po- 
Note  that  by  Theorem  2.9,  the  solution  (ao.^o)  is  unique,  and  any  other  solution 
(ofi,/?i)  ^  (ooi/Jo)  must  have  the  property  that  F(ai,/3i)  >  F{ao,^o)- 

Case  1:  u^Aj  =  0 

From  the  proof  of  Theorem  2.12  and  Theorem  2.13,  we  know  that  (otoi  /^o)  = 
(1,0),  and  F(ao,/?o)  =  —  t^lP-  If  we  were  to  insist  that  we  minimize  over 

strictly  positive  (a,/?),  we  would  get  an  >  (tj,f2)  >  0  for  some  (ei.ea), 

and(ai,/?i)  ^  (ao,/?o)-  Thus  from  Theorem  2.9  and  Corollary  2.10,  F(ai,fii)  > 
F(qo,0o),  or  in  other  words,  F{a\,0i)  >  ||6- w|p. 

Case  2;  u^Aj  <  0 

From  the  proof  of  Theorem  2.12,  we  see  that  0o  <  0.  From  Theorem  2.11, 
we  know  that  there  is  no  (ai,/3i)  >  0  such  that  F{a\,0i)  <  ||6  — v  |p.  Therefore 
if  we  were  to  insist  that  we  minimize  over  strictly  positive  (a,  J),  we  would  get 
an  (ai,/?i)  >  (ei,e2)  >  0  such  that  F(ai,0i)  >  ||6-  v||^. 

Case  3:  u^Aj  >  0 

From  Corollary  2.14,  we  know  that  {ao,0o)  >  0.  We  also  know  that  F  = 
II 6  —  t)|p  is  attainable  with  (a,/?)  =  (1,0).  Because  the  solution  is  unique,  we 
can  conclude  that  F(l,  0)  >  F(ao,/?o).  or  in  other  words,  F(oo,/?o)  <  II  IP- 
Only  in  Case  3  are  we  able  to  find  an  {q,0)  >  0  such  that 

min  II 6  -  an  -  0Aj  (p  <  ||  6  —  v  |p. 

Therefore,  the  only  columns  to  consider  to  improve  the  approximation  v  are 
those  such  that  u^Aj  >  0,  provided  that  any  such  columns  exist.  Thus  from  the 
solution  we  computed  for  problem  (2-9),  we  see  that  the  solution  to  to  (2-7)  is 
given  by 

AJu 

\  1/2  ’ 

v'^v  —  (Ajvyj 

provided  that  3  j  :  Aju  >  0.  We  also  showed  that  if  >4^u  <  0,  then  any  solution 
with  (a,  0)  >  0  will  produce  a  minimum  of  (2-7)  that  is  >  ||  6  —  n  |p  (and  hence 
(a,  /?)  =  (1,0)  is  as  good  a  solution  as  any  possible).  □ 


s  =  argmax 
;  :4j’u>0 
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We  will  eventually  consider  more  carefully  the  implications  of  the  situation 
when  <  0,  but  we  need  more  results  first. 

2.5  Basic  Weightings 

A  set  of  linearly  independent  colurrms  of  A,  {Aj,  A2,  ■  •  • ,  A*},  will  be  denoted 
by  B,  and  be  called  a  basis',  the  set  of  indices  {1,2, . . . ,  fe}  will  be  referred  to  as 
basic  indices.  A  nonnegative  weighting  of  a  basis  is  any  v  =  Bxb  with  xb  >0. 
It  is  a  positive  weighting  if  xg  >  0.  A  positive  weighting  such  that  v  is  closer 
to  6  than  is  any  other  positive  weighting  will  be  referred  to  as  a  basic  weighting. 

Given  a  basis,  there  may  or  may  not  exist  a  positive  weighting  that  is  closest 
to  b.  That  is  to  say,  the  closest  nonnegative  weighting  may  have  some  weights 
(*B)i  =  0i  in  which  case  it  is  a  basic  weighting  for  the  subset  of  columns  of  B 
where  (xb)i  >  0. 

Now  consider  the  least-squares  problem 

min  II  b—  Bxb  IP 

Let  Xg  be  the  solution  and  let  v°  =  Bx%.  We  will  refer  to  such  a  as  the 
least-squares  weighting  of  B.  Notice  that  this  implies  that  a  positive  weighting 
of  a  basis  S  is  a  basic  weighting  if  and  only  if  it  is  the  least-squares  weighting 
of  the  columns  of  B. 

Fact  2.17 

Any  basic  weighting  v  of  a  basis  B  is  unique. 

Theorem  2.18 

Let  V  =  Bxb  be  a  positive  weighting  of  the  columns  of  B.  Let  u  =  b  —  v.  Then 
a  necessary  and  sufficient  condition  that  v  be  a  basic  weighting  is  xFB  —  0. 

Proof;  We  know  that  if  t;  is  a  positive  weighting,  then  v  is  a  basic  weighting 
if  and  only  if  it  is  a  least-squares  weighting.  Thus  it  is  equivalent  to  prove 
that  a  necessary  and  sufficient  condition  that  v  be  a  least-squares  weighting  is 
u^B  =  0.  The  normal  equations,  which  are  necessary  and  sufficient  conditions 
for  V  to  be  the  least-squares  weighting,  are 

B^(Sxb  -6)  =  0 

Noting  Bxb  — b  =  v-  b  —  u,  the  result  follows.  □ 

Corollary  2.19 

Given  v,  a  basic  weighting  corresponding  to  B,  and  u^A,  ^  0  where  u  —  b  —  v, 
then  {fl.  A,}  is  a  set  of  linearly  independent  columns. 

Proof:  By  Theorem  2.18,  u^B  =  0  because  v  is  a  basic  weighting.  Since 
u^A,  ^  0,  we  know  that  A,  is  not  contained  in  the  space  spanned  by  the 
columns  of  B.  (Otherwise  A,  would  be  a  linear  combination  of  the  columns  of 
B,  and  we  would  have  u^A,  =  0.)  □ 
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Corollary  2.20 

Ifv  is  a  basic  weighting  corresponding  to  B,  and  u  =  b—v,  then  if  we  select  A, 
by  the  criterion  (2-8),  where  >  0,  then  A,  is  linearly  independent  of  the 
columns  in  B. 

Proof:  This  follows  directly  from  Corollary  2.19,  as  Aju  ^  0.  □ 

Corollary  2.21 

If  A,  is  chosen  as  in  Corollary  2.20,  then  the  least-squares  problem 

min  II 6  -  fliB  -  >1***  11^  (2-10) 


has  a  unique  solution. 

Proof:  B  has  linearly  independent  columns.  By  Corollary  2.20,  is  also 

a  set  of  linearly  independent  columns.  Therefore  by  Fact  2.4,  the  least-squares 
problem  above  has  a  unique  solution.  □ 


2.6  Strict  Improvement 

Theorem  2.22 

Let  V  =  Bxg  be  a  basic  weighting  corresponding  to  basis  B  and  let  u  =  b  —  v. 
Let  V  be  closer  to  b  than  nA,  for  >  0  and  let  A,  be  linearly  independent  of 
the  columns  of  B.  Let  =  Bxg  +  Afxl  be  the  unique  least-squares  weighting 
of  the  problem  (2-10),  and  let  Uj  =  6  — Dj.  Ifu^A,  >  0,  then  the  least-squares 
weighting  satisfies  xj  >  0. 

Proof:  We  know  v  is  closer  to  6  than  is  Xv  for  A  ^  1  because  t;  is  a  basic 
weighting.  We  also  know  A,  ^  liv  because  u^v  =  0  by  Fact  2.2,  and  we  know 
that  u^j4j  >  0.  Let  F  =  ||6  —  ou  —  x,i4,  ||^.  Let  a  and  x,  be  the  values  of  a 
and  X,  at  the  minimum  of  F.  From  Corollary  2.14,  we  can  conclude  that  both 
a  and  x,  are  positive.  Finally,  let  t;  =  x,A,  -1-  av.  Since  u'  is  the  lejist-squares 
weighting  of  {B,  ^4,},  we  have  ||  ^  —  v’  |P  ^  II  ^  ^  IP- 

We  need  to  show  that  ||  —  5  |P  <  11  ft  —  u  ||^-  We  know 

II 6  -  u  11^  =  ft^ft  -  2v^v  -I-  v^v  (from  Fact  2.2) 

=  ft^ft  —  v^v. 


From  the  proof  of  Theorem  2.16,  we  know  that 


and  that 


.  v'^v  Aju 
"  vT-v  -  {Ajv)^ 


and 


_  v^v  —  AJv  Ajb 
v^v  —  {AfvY 


II  ft  —  ii  Ip  =  ft^ft  —  v^v 


(AW 

v^v  —  (AjvY 
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Proof:  This  follows  directly  from  (2-11).  □ 

Now  consider  the  following  situation.  As  before,  assume  we  have  a  basis  B, 
a  basic  weighting  v  =  Bxb,  and  u  =  b  —  v.  Now  let  A,  be  another  column 
from  A,  where  A,  is  linearly  independent  of  the  columns  in  B,  u^A,  >  0,  and 
V  is  closer  to  b  than  is  fiA,  for  p  >  0.  The  linearly  independent  set  of  columns 
{B,A,},  and  the  solution 

min  II 6-  Bya  -  A,y,  |p 

has  the  property  that  yj  >  0  by  Theorem  2.22.  Note  however,  that  we  cannot 
say  that  yg  >  0. 
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Theorem  2.24 

Given  the  situation  described  above,  we  can  find  a  new  basis  B  where  B  is 
formed  from  a  subset  of  the  columns  of  (B  A,  ),  and  the  solution  xb  >  0  has 
the  property  that 

min  II 6  —  Bxb  |P  <  min  ||i  —  Bxb  |P- 

In  addition,  A,  will  be  one  of  the  columns  in  B. 

T 

Proof:  Let  J/’  =  (  j/b  Vs)  solution  to 

min||6-(B  i45)y|p. 

Because  u^A,  >  0,  we  can  use  Theorem  2.22  to  assert  that  y’  >  0.  From 
Corollary  2.23,  we  also  have  the  fact  that 

\\b-BxB\\^>\\b-{B  yl.)y*|p. 

In  order  to  continue,  we  need  some  additional  notation.  We  will  be  forming 
B  from  {B  ),  and  we  may  need  to  iteratively  delete  a  number  of  the  columns 
of  B.  We  will  denote  the  subset  of  the  coluntms  remaining  at  some  iteration  by 
B^  (indicating  that  B^  corresponds  to  the  current  yg).  Initially,  this  means 
that  B  =  B«. 

Case  1:  j/*  >  0 

Let  B  =  {B^  >1, )  and  let  xb  =  y*.  This  assignment  is  valid,  since  y'  is  a 

basic  weighting  of  the  columns  of  ( B^  A,  ),  and  A,  is  one  of  the  columns  of 
B. 

Case  2:  y*  >  0 

Let  y*  be  the  positive  elements  of  y*.  Remember  that  y*  >  0,  so  y*  6  y*. 
Let  B^  be  the  columns  of  B^  corresponding  to  the  elements  of  y*.  Let  B  = 
{  B^  A,)  and  let  xb  =  y*.  Again,  this  is  valid  because  y*  is  a  basic  weighting 
of  the  columns  of  (  B^  A, ),  and  A,  is  one  of  the  columns  of  B. 

Case  3;  We  do  not  have  y*  >  0. 

In  this  ceise,  we  know  that  at  least  one  element  of  y^  is  negative,  and  we 
have  a  significantly  more  complicated  situation.  We  need  to  drop  one  or  more 
columns  from  {B^  A,)  such  that  there  exists  a  positive  least-squares  solution 

for  the  remaining  columns.  In  order  to  do  this,  we  must  redefine  B®'  and  y*, 
and  then  go  bjw:k  and  check  the  sign  of  y*  against  Cases  1-3  again.  This  means 
that  we  could  perform  the  steps  (to  be  described)  in  Case  3  more  than  once. 

The  first  time  through  Case  3,  we  define  x'  =  xb,  and  we  have  B®  =  B. 
(Note  that  neither  equation  will  be  true  in  subsequent  times  through  Case  3.) 
We  now  form  the  convex  combination 
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and  seek  the  smalled  A  <  1  such  that  c  >  0.  This  A  is  determined  by 

A  =  max  — - — ^ . 

»;<0  -  y) 

It  is  clear  for  A  G  (0, 1)  that  c,  >  0.  We  will  drop  all  columns  of  ( B*  ^, ) 
corresponding  to  zero  elements  of  c. 

Each  pass  through  Case  3,  one  convex  combination  is  formed.  In  the  first 
pass,  we  have  B*  =  B,  and  we  have  proved  that  A,  is  retained  the  first  time  a 
convex  combination  is  formed  during  the  column-dropping  process.  We  need  to 
prove  that  no  matter  how  many  convex  combinations  (and  corresponding  passes 
through  Case  3)  are  necessary,  is  not  dropped.  In  order  to  prove  this,  we 
must  redefine  x'^,  B*  and  y*  as  follows. 

Let  c  be  the  positive  elements  of  c.  Let  be  the  columns  of  B*'  corre¬ 
sponding  to  the  elements  of  c.  Let  y*  be  the  solution  to 

min||6-(B*'  >4,  )y|p. 

In  order  to  redefine  a:*  to  be  c,  we  need 

|:>«-Bxb1|'>||6-(BS'  .4.)c||2>||6-(B!'  >1,)!/’||^  (2-12) 

In  addition,  in  order  to  redefine  y*  to  be  y*  and  B*'  to  be  B*',  we  need 

y;>0.  (2-13) 


We  have 

\\b-BxB\\'^>\\b-{By  A,)cf>\\b-iBy  A,)y-\f 

from  Fact  2.5.  We  also  know  that 

li6-(Bv  ^)c|p  =  ii6-(^v  A,)cf 
from  the  definition  of  c  and  Ay ,  and  since  y*  is  a  least-squares  solution,  we  have 

\\b-(By  A.)cf>\\b-{By  A,)r\f. 

Finally,  both  y*  and  y*  are  least-squares  solutions,  but  B“  is  a  proper  subset  of 
B*,  so 

|16-(Bv  ^)y*||2>l|6-(Bv  ^)y*ir 
Altogether,  this  gives  us 

||6-BrfllP>P-(B’'  ^)a|P>IIMS>'  A.)y*lP>|16-(BV  A,)y*f, 
which  proves  (2-12). 
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Now  we  need  to  consider  (2-13),  the  sign  of  y*.  We  will  prove  that  y*  >  0 
by  contradiction,  so  assume  that  y*  <  0.  We  will  form  a  convex  combination  of 
y*  and  y*  as  follows.  Let  z  be  defined  as 

^  \0, 

This  gives  us 

||fr-(Bv  >l.)y*||2=  b-(By  ^)^^.)p>||6-(BV 
Now  form  the  convex  combination 

Because  yj  <  0  and  y*  >  0,  we  can  pick  A  6  (0, 1]  such  that  cf  =  0.  Fact  2.5 
says  that 

||6-(BV  >l,)y-||2>||6-(BV  A,)c^\\^>\\b-(By  A,)y‘\\\ 

However,  this  would  imply  that 

||6-Bxb|P>||6-(Bv  ^)c*||^ 

which  when  cj  =  0,  can  be  rewritten  as 

||6-Bxb||2>  b-{By  b-(By 

and  thus 

||6-Bxb  |p>||6-BVc^|p. 

However,  By  is  a  proper  subset  of  the  columns  of  B,  and  xb  is  the  least-squares 
solution  to 

min  II 6-  Bxb  H^, 

so  that  we  know 

\\b-BxB\\^<\\b~Byc%f. 

This  is  our  contradiction.  Thus  we  must  have  y*  >  0,  and  (2-13)  has  been 
shown  to  be  true.  We  can  therefore  set  x'  =  c,  B’'  =  B*',  y*  =  y*,  and  go  back 
atnd  check  this  new  y*  against  Cases  1-3.  Note  that  if  Case  3  again  applies  and 
additional  columns  must  be  dropped,  we  will  have  the  following  upon  entry: 

||6-Bxb||^>||6-(B»  ^)x'||2>||6-(Bv  ^)y*||^ 
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and  when  a  new  convex  combination  c  is  formed,  we  will  find 
||6-Bxb||^>||6-(Bv  ^,)*'j|2>||6-(Bv  ^  )  c||2  >  ||  6-(  5^  A,)y*f 


by  Fact  2.5.  Thus  (2-12)  will  still  hold  in  subsequent  passes  through  Case  3. 
We  repeat  this  process  of  checking  the  sign  of  y*  and  dropping  columns  until 
Case  1  or  Case  2  is  satisfied. 

We  will  now  show  that  y*  will  eventually  satisfy  Case  1  or  Case  2.  We 
know  that  y*  >  0,  no  matter  how  many  convex  combinations  are  formed,  and 
no  matter  how  many  times  y*  and  fl®  are  consequently  redefined.  Therefore 
column  A,  is  always  retained  during  this  process.  Thus  we  can  conclude  that  in 
the  worst  case,  we  will  find  y*  >  0  when  B*'  =  0  (and  A,  is  the  only  remaining 
column  in  B),  at  which  time  Case  1  is  satisfied.  □ 

Note  that  Theorem  2.24  did  not  disallow  the  possibility  that  all  columns  be 
dropped  from  A^  before  we  can  find  a  y*  >  0.  That  is,  it  implies  that  it  could 
be  possible  that  {>!<;}  =  A,  and  =  y*.  We  will  prove  that  this  is  actually 
noi  possible. 

Corollary  2.25 

If  the  first  column  ever  added  to  B  is  Ar  where  r  is  determined  by  (2-6)  in 
Theorem  2.15,  then  y*  will  satisfy  Case  1  or  2  before  all  columns  other  than  A, 
are  dropped  from  the  basis. 

Proof:  Assume  that  y*  never  satisfies  Case  1  or  Case  2  until  only  column  A, 
is  remaining  (and  y*  =  yj  >  0).  From  Theorem  2.24,  we  know  that 

||6-Bxb  ||^>||6-Bxb  ||2, 

which  (if  all  columns  other  than  A,  are  dropped)  can  be  rewritten  as 

||6-BxB||2>||6-A,y:||^ 

Since  Ar  was  the  first  column  placed  in  B,  we  know  from  Theorem  2.24  that 

\\b-Ary;f>\\b-BxBf, 
where  y*  >  0  is  the  solution  to 

min  II  b  -  ArPr  ||^. 


Thus 

which  implies 


\\b-Ary:f>\\b-BxBf>\\b-A,y:\\\ 

||6-A,y;||^>||6-A.y;||^ 


However,  Ar  was  chosen  to  be  the  solution  to  (2-5)  (see  Theorem  2.15).  Thus 
we  have  a  contradiction.  □ 
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2.7  Infeasibility 

Until  now,  we  have  not  considered  the  feasibility  or  infeasibility  of  (2-1).  This 
section  discusses  the  issue  of  infeasibility  and  how  it  affects  the  solution  to  the 
least-squares  problems  considered  in  Theorems  2.15  and  2.16,  as  well  as  other 
related  issues. 

Fact  2.26 

If  A^b  <  0,  we  can  conclude  that  (2-1)  is  infeasible. 

Proof:  Given  that  6^0,  this  follows  directly  from  Farkas’  Lemma.  □ 

Fact  2.26  implies  that  if  in  the  process  of  finding  the  solution  to  (2-5)  (see 
Theorem  2.15),  we  find  that  Ajb  <  0  V  j,  then  we  have  established  that  (2-1) 
is  infeasible. 

Fact  2.27 

Let  A  be  as  in  Fact  2.26.  Let  v  ^  0  be  closer  to  b  than  any  Xv,  A  ^  1,  and  let 
u  =  b  —  V.  If  u  ^  0  and  u^A  <  0,  then  the  original  problem  (2-1)  is  infeasible. 

Proof:  We  know  from  Fact  2.2  that  b'^n  =  vTu,  and  we  know  that  u  ^  0. 
Therefore  this  follows  directly  from  Farkas’  Lemma.  □ 

Corollary  2.28 

Let  V  be  a  basic  weighting  corresponding  to  basis  B  and  let  u  =  b  —  v.  If 
u^Aj  <0  ^  Aj  ^  B,  then  (2-1)  is  infeasible. 

Proof:  We  know  that  v  is  a  basic  weighting  of  B,  so  we  can  conclude  from 
Theorem  2.18  that  u^Aj  =  O'i  Aj  £  B.  Thus  we  have  u^Aj  <  0  V  Aj.  We  can 
see  by  Fact  2.27  that  (2-1)  is  thus  infeasible.  □ 

Theorem  2.29 

Let  u  =  Bxb  be  a  basic  weighting  corresponding  to  the  basis  B  and  let  u  =  b  —  v. 
If  infeasibility  is  detected  (i.e.,  if  u^Aj  <0  'd  Aj  ^  B),  then  v  is  at  least  as 
close  to  b  as  is  any  other  nonnegative  weighting  of  the  columrs  of  A. 

Proof:  Let  BI  be  the  set  of  indices  such  that  j  e  Bl  if  and  only  if  Aj  €  B. 
We  can  write  v  as 

j^Bi 

where  (re);  is  the  component  of  xg  corresponding  to  Aj. 

Since  we  have  discovered  infeasibility,  we  know  that  u  ^  0.  We  also  know 
that  V  is  closer  to  6  than  is  any  Avb,  A  ^  1  because  xg  is  the  least-squares 
solution  to 

II  ^ 

7€fl/ 
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We  therefore  know: 


u^b  =  u^u  by  Fact  2.2 
u^Aj  =  0  ^  j  £  BI  by  Theorem  2.18 

u^Aj  <  0  'i  j  ^  BI  because  infeaisibility  has  been  detected. 

Let 

n 

V  =  ^Ajxj 
i=i 

be  any  other  nonnegative  weighting  of  the  columns  of  A.  Therefore  we  have 
Xj  >  0  V  j.  Let  u  =  b  —  V.  Consider 

U^UB  =  U^b  -  AjXj 

i=i 

=  u^u  —  u^AjXj  —  u^AjXj 

j^Bl  jgBI 

=  u^u  -  ^  viFAjXj  >  u^u. 

jgBt 

Now  consider 

u\  =  II  u  nil  u  11;  u^«  =  ||u|l||u||cos0, 

where  6  is  the  angle  between  u  and  u.  We  have 

||u||||«||cosfl  >  II  U  nil  u  II  =>  llwllllwllcos^  >  nw||ll«ll. 
which  can  be  extended  to 

n«iiii«n  >  n«nn«ncos^  >  n«iiii«n  >  n«nn«iicos». 

Therefore  ti^u  >  u^u  >  uTu.  So  we  can  conclude  that  v  is  at  least  as  close 
to  6  as  is  any  other  nonnegative  weighting  of  the  columns  of  A.  □ 

Corollary  2.30 

Given  B,  v  and  u  as  in  Theorem  2.29.  If  v  is  closer  to  b  than  is  any  other 
nonnegative  weighting  of  the  columns  of  A,  then  infeasibility  will  be  detected 
(that  is,  we  will  find  u^Aj  <  0  V  j  ^  S/ 

Proof:  Assume  that  u^A,  >  0  for  some  A,  ^  B.  If  we  add  A,  to  B  and  drop 
columns  as  described  in  Section  2.6,  then  we  will  find  a  B  and  xg  >  0  c  .n 
that  V  =  Bxb  is  closer  to  6  than  is  v.  This  contradicts  the  assumption  that  v 
is  closer  to  6  than  is  any  other  nonnegative  weighting  of  the  columns  of  A.  □ 
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2.8  The  Least-Squares  Phase  I  Algorithm 

As  stated  in  Section  2.1,  the  main  objective  of  the  Phase  I  algorithm  to  be 
presented  here  is  to  build  a  basis  matrix,  made  up  of  selected  columns  of  A, 
and  to  find  positive  weightings  of  these  columns  such  that  the  basis  and  the 
associated  positive  weightings  are  a  solution  to  (2-2).  Remember  that  we  have 
rescaled  (2-1)  such  that  ||  Aj  ||  =  1  V  j  and  |16||  =  1- 

At  the  start  of  an  iteration,  we  are  given  a  basis  B,  composed  of  a  linearly 
independent  subset  of  the  columns  of  A,  with  the  property  that  the  least-squares 
weighting  of  these  columns  which  yields  the  closest  approximation  to  the  right- 
hand  side  6  is  a  positive  weighting.  The  iterative  step  tests  whether  the  current 
approximation  is  the  closest  approximation  with  a  positive  weighting,  and  if  not, 
selects  an  incoming  column  with  the  property  that  a  positive  combination  of  the 
previous  approximation  and  the  incoming  column  is  a  strictly  better  approxi¬ 
mation  to  the  right-hand  side.  There  exists,  however,  a  (possibly  proper)  subset 
of  the  columns  of  the  augmented  basis  such  that  the  least-squares  weighting  is 
positive,  and  is  also  a  better  approximation  to  the  right-hand  side  than  the 
simple  positive  combination.  This  subset  of  the  augmented  set  of  basic  columns 
can  be  used  to  start  the  next  iteration. 

Algorithm  Pseudocode 

Notation  and  Variables: 

A:  The  original  m  x  n  matrix  of  (2-1). 

6:  The  original  right-hand  side  of  (2-1). 

nbi:  The  number  of  currently  basic  columns  (the  number  of  columns  in  B). 

B".  The  matrix  of  currently  basic  columns,  m  x  nbi  of  B  is  used,  and  nbi  can 
be  as  large  as  m. 

BI:  An  m-vector  containing  indices  of  the  columns  of  B  relating  them  to  the 
columns  of  A.  For  example,  if  B/(«]  =  j,  the  ith  coluiiui  of  B  contains 
the  jth  column  of  A.  BI[i]  —  0  means  no  column  corresponds  to  the  ith 
component  of  BI,  which  implies  there  must  be  less  than  i  basic  columns. 

scb:  The  current  positive  weightings  of  the  columns  of  B.  The  first  nbi  com¬ 
ponents  of  xb  (corresponding  to  the  nbi  columns  in  B)  are  used,  and  nbi 
can  be  as  large  as  m. 

v:  The  current  approximation  Barg. 

«:  The  current  residual  b  —  v. 

y:  The  current  least-squares  solution  of  min||6—  Bylp.  The  first  nbi  compo¬ 
nents  of  y  are  used,  and  n6i  can  be  as  large  as  m. 
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X:  The  vector  to  hold  the  solution  to  (2-1).  Note  that  xb  holds  the  solution 
to  (2-2). 


0.  Initialization  {no  columns  initially  in  basis) 

B  :  =  9  {no  columns  in  basis) 
xg  :  =  0  {no  current  basic  solution) 

BI  :=  <6  {no  basic  indices) 

nbi  :  =  0  {set  number  of  ba.  ic  columns  to  0) 

u  :  =  6  {set  residual  to  6) 

V  :  =  0  {set  current  approximation  v  =  Bxb  =  0) 
yV  ;  =  0  {set  final  solution  to  0} 

1.  Startup  {find  the  first  column  to  place  in  B) 

If b^Aj  <0  Vj=-l,...,n  then 

Go  to  4.  {the  problem  is  infeasible) 
s  :=  argmaxj 

Place  A,  in  the  first  column  of  B. 

B/[l]  :  =  s  {record  first  column) 
y[l]  :  =  b^A,  {record  basic  solution) 
nbi  i  =  1  {set  number  of  basic  columns) 

2.  Main  Loop  {Add  columns  to  B  until  u  =  0  or  infeas.  is  discovered.) 

Xb  -  ^y 

V  :  =  Bxb  (set  current  approximation) 
u  :  =  b  —  V  {set  current  residual) 

If  u  =  0  then 

Go  to  4.  {solution  found) 

If  u%  <0  ^  j^BI  then 

Go  to  4  {infeasibility  discovered) 
n6i  :  =  nbi  -(-  1  {increment  number  of  basic  columns) 


s  =  argmax 

itBI 

j  aJ,>o 


Aju 

{v^v  -  iATv^y'^ 


Place  A,  in  column  nbi  of  B. 

BI[nbi]:  =  s  {record  position  of  incoming  column.) 

3.  Least-Squares  Loop 

Solve  II  By  -  6 1|*  for  y 
If  y  >  0  then 

Go  to  2.  {new  B  and  xb  found;  repeat  Main  Loop) 
If  y  >  0  then 
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Remove  all  columns  from  B  and  elements  from  xb  that 
correspond  to  zero  components  of  y.  Update  BI  accordingly. 
Go  to  2.  {new  B  and  xb  found;  repeat  Main  Loop} 

A  :  =  maXj,.<o  -yi/(c,  -  y.) 

r  :  =  Ac  +  (1  —  X)y  (form  convex  combination  of  c  and  y  } 

Remove  all  columns  from  B  and  elements  from  xb  and  c  that 
correspond  to  zero  components  of  c.  Update  BI  accordingly. 
C«o  to  3.  (repeat  Least-Squares  loop) 

4.  Done  (clean  up} 

for  i  :  =  1  to  nbi  do  (fo’ m  the  answer  to  the  original  problem} 
Report  on  feasibility  and  output  the  solution  found. 


We  now  make  the  following  observations: 

1.  The  approximation  v  =  Bxb  is  strictly  closer  to  6  at  each  iteration  (see 
Theorem  2.24).  This  means  that  degeneracy  does  not  cause  problems; 
cycling  cannot  occur,  and  the  process  terminates  in  a  finite  number  of 
steps  because  given  B,  xb  is  uniquely  determined,  and  there  are  a  finite 
number  of  bases  B. 

2.  If  infeasibility  is  detected,  then  the  current  approximation  r  =  Bxb  is 
closer  to  6  than  is  any  other  nonnegative  weighting  of  the  columns  of  A 
(see  Theorem  2.29).  Similarly,  if  u  ^  0  eind  v  =  Bxb  is  closer  to  6  than 
is  any  other  nonnegative  weighting  of  the  columns  of  A,  then  infeasibility 
will  be  detected  when  trying  to  add  a  new  column  (see  Corollary  2.30). 

3.  Whenever  a  new  column  is  chosen  from  A  to  enter  B,  it  is  linearly  in¬ 
dependent  of  the  columns  currently  in  B  (see  Corollary  2.19).  Also  due 
to  the  linear  independence  of  entering  columns,  our  least-squares  solution 
(the  solution  to  min  (|  6  —  By  (()  is  always  unique,  so  numerical  difficulties 
aiside,  rank  deficient  least-squares  problems  are  never  encountered. 

2.9  Variations 

Free  Variables 

Consider  the  system 


Ax  —  6 

Xi  >  0  for  some  set  of  i  (2-14) 

Xj  free  for  some  set  of  j 

This  system  could  always  be  converted  to  the  form  of  (2-1)  by  replacing  each  free 
variable  xj  with  two  new  nonnegative  variables  x}  and  x"  where  the  original  Xj 
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is  now  represented  by  Xj  —x".  However,  let  us  consider  how  to  solve  a  feasibility 
problem  with  free  variables  without  actually  adding  any  more  variables. 

In  the  pure  algorithm  described  in  Section  2.8,  the  entering  columns  is  ^4, 
where  s  is  determined  by  (2-8).  This  assumes  that  all  variables  are  nonnegative. 
If  free  variables  are  present,  we  can  write  the  rule  for  selecting  the  incoming 
column  s  as 


s  =  argmax 


max 

•*.>0 

I 


AJu 


\Aj^\  1 

max  - ^ - TTT  > 

(v'^v  -  (AJv)^^  J 


Therefore  we  see  that  we  can  select  an  incoming  column  when  free  variables 
are  present  without  actually  adding  new  variables  and  columns.  Infeasibility  is 
detected  when  Aju  <0  V  i :  >  0,  and  Aju  =  0  V  j  :  xj  is  free. 

The  only  other  change  in  the  algorithm  concerns  testing  the  sign  of  y  in 
the  Least-Squares  Loop  part  of  the  algorithm.  Elements  of  y  corresponding  to 
free  variables  may  take  on  any  nonzero  value.  If  such  an  element  of  y  is  found 
to  be  zero,  it  is  dropped  just  as  any  other  zero  element  of  y  would  be.  The 
only  components  of  y  that  must  be  strictly  positive  are  those  corresponding 
to  variables  restricted  to  be  nonnegative.  Thus,  when  A  is  formed,  the  only 
components  yi  <  0  that  are  considered  are  those  corresponding  to  nonnegative 
variables. 

All  the  results  previously  discussed  (strict  improvement,  infeasibility  de¬ 
tection,  etc.)  still  hold,  as  these  modifications  to  the  pure  algorithm  simply 
implement  the  result  of  splitting  free  variables  without  actually  doing  the  work 
of  adding  variables  and  columns. 


No  Denominator 

Consider  the  selection  of  the  incoming  column  A,.  We  must  form  Aju  for  all 
columns  not  currently  in  the  basis,  and  (2-8)  for  all  columns  such  that  Aju  > 
0.  As  the  denominator  is  never  directly  used  in  any  of  the  proofs  relating  to 
infeasibility,  strict  improvement,  etc.,  if  we  replace  the  denominator  of  (2-8) 
with  1,  we  produce  a  perfectly  valid  column-selection  rule  that  requires  less 
computation  per  candidate  column. 

It  would  be  expected  that  this  modified  rule  might  not  perform  as  well  as  the 
pure  rule,  since  we  may  not  be  selecting  the  A,  that  minimizes  (2-7).  However, 
we  can  still  guau’antee  (a, /I?)  >  0  because  Aju  >  0. 

As  it  turns  out,  the  results  produced  (on  test  problems  to  be  described 
later)  show  very  little  difference  in  iteration  count  between  either  of  these  two 
variations  (“no  denominator”  and  “free  variable”)  and  the  pure  algorithm.  Ap¬ 
parently,  it  is  the  direction  of  the  candidate  incoming  column  that  is  important, 
not  the  particular  scaling  factor  used  in  the  column-selection  rule. 
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Crash  Basis 

Until  now,  we  have  built  up  the  basis  B  starting  from  scratch — that  is,  initially 
B  contains  no  columns.  If  we  could  pick  out  a  starting  set  of  linearly  independent 
columns  without  much  effort,  we  could  start  with  them  already  in  B. 

Consider  multiplying  by  —  1  eJl  rows  of  A  (and  elements  of  b)  corresponding 
to  negative  elements  of  6.  This  gives  us  a  positive  right-hand  side.  Now  con¬ 
sider  selecting  all  colunuis  of  this  modified  A  that  are  columns  of  the  identity 
matrix,  i.e.,  columns  of  the  form  Cj,  where  e  is  a  vector  of  zeros  with  a  1  in 
the  jth  position.  (Note  that  most  columns  of  this  form  will  correspond  to  slack 
variables.)  If  we  place  all  such  columns  in  B,  and  eliminate  duplicate  columns, 
we  will  get  the  problem  min  ||  6  —  Bx  ||,  and  the  solution  xb  will  consist  of  the 
elements  of  6  corresponding  to  the  nonzero  elements  of  the  identity  columns. 

All  of  the  theory  for  the  pure  algorithm  holds  for  this  method  sts  well,  with 
the  exception  of  Corollary  2.25.  Since  we  do  not  know  that  the  column  As 
satisfying  (2-5)  was  the  first  colunm  selected  to  enter  the  basis,  we  can  no 
longer  guarantee  that  all  columns  (other  than  the  incoming  column)  will  not 
be  dropped.  In  our  computational  experience,  such  dropping  of  all  columns  has 
not  been  observed,  and  is  thus  presumed  to  be  rare  in  practice. 

2.10  Discussion 

We  have  proved  in  Theorem  2.29  and  Corollary  2.30  that  if  (2-1)  is  infeasible, 
we  will  detect  such  infeasibility  when  our  current  solution  Xb  >  0  is  as  close  as 
possible  to  the  right-hand  side  b.  That  is,  v  =  Bxb  is  at  least  as  close  to  6  as 
is  any  other  nonnegative  weighting  of  the  columns  of  A.  Thus  we  see  that  the 
least-squares  algorithm  actually  solves  the  problem 

min  ^116- Ax  1!^  (2-15) 

s.t.  X  >  0. 

If  the  least-squares  algorithm  finds  a  feasible  solution  to  (2-1),  then  the  optimal 
objective  value  of  (2-15)  is  0.  Otherwise,  the  optimal  objective  of  (2-15)  is 
positive,  we  have  an  optimal  solution  xb,  and  no  feasible  solution  exists  to  (2- 
!)• 

Problem  (2-15)  is  often  called  the  “non-negative  least-squares”  problem 
(NNLS),  and  can  also  be  considered  a  degenerate  case  of  the  “bounded  least- 
squares”  problem  (BLS),  in  which  x  can  have  both  upper  and  lower  bounds. 
Lawson  and  Hanson  [12]  present  an  algorithm  for  NNLS  that  is  very  closely 
related  to  our  algorithm.  The  only  differences  between  the  algorithms  is  that 
first,  a  denominator  of  “1”  is  used  in  the  column  selection  rule  of  NNLS.  Sec¬ 
ondly,  after  an  incoming  column  A,  has  been  successfully  introduced  into  the 
basis  (and  all  necessary  dropping  of  other  basic  columns  is  complete),  our  al¬ 
gorithm  takes  all  basic  variables  with  a  zero  value  and  makes  them  nonbasic, 
while  Lawson  and  Hanson’s  NNLS  algorithm  does  not. 
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We  implemented  both  the  NNLS  algorithm  and  the  “denominator-of-1”  vari¬ 
ation  of  the  least-squares  algorithm,  and  tested  the  efficiency  of  the  algorithms 
on  the  smallest  30  linear  programming  problems  available  from  NETLIB.  (See  [7] 
and  Section  3  for  more  details  about  these  problems.)  In  practice,  these  two 
^llgorithms  rarely  chose  different  incoming  columns.  A  substantial  difference 
between  the  two  algorithms  was  observed  only  once,  on  problem  SCSD6,  which 
is  highly  degenerate  in  the  simplex  method  sense.  It  also  appears  to  be  the  case 
(in  practice)  that  the  NNLS  algorithm  will  occasionally  have  the  same  final 
basic  index  set  as  does  the  least-squares  algorithm  with  an  additional  handful 
of  basic  variables.  In  our  experiments,  these  additional  basic  variables  were  all 
at  a  very  low  level  (on  the  order  of  10“*°  or  less).  Thus  the  least-squares  al¬ 
gorithm  occasionally  found  a  “cleaner”  solution  than  did  the  NNLS  algorithm, 
apparently  due  to  numerical  error. 

Bjdrck  [3]  presents  an  algorithm  for  BLS,  that  when  applied  to  NNLS,  is  even 
closer  to  our  algorithm  that  is  Lawson  and  Hanson’s  algorithm  for  NNLS.  The 
only  difference  between  the  BLS  algorithm  and  the  least-squares  algorithm  is 
that  a  denominator  of  “1”  is  used  in  the  column  selection  rule  of  NNLS.  There¬ 
fore,  the  “denominator-of-l”  version  of  the  least-squares  algorithm  is  equivalent 
to  the  BLS  algorithm.  Neither  [3]  nor  [12]  give  any  numerical  results  for  this 
algorithm. 

We  also  found  the  following  interesting  equivalence  between  the  least-squares 
algorithm  and  the  convex  quadratic  programming  algorithm  due  to  Dantzig  [5] 
and  V2in  de  Panne  &  Whinston  [14].  Before  we  can  show  this  equivalence,  we 
must  first  reformulate  (2-15)  as  a  quadratic  programming  problem. 

The  constrained  least-squares  (2-15)  has  the  following  necessary  and  suffi¬ 
cient  conditions  for  optimality; 

-A^6  +  A^Ax  >  0 
x  >  0 

x^(-A^b  -I-  A^Ax)  =  0. 

These  conditions  can  be  formulated  as  the  Linear  Complementarity  Problem 
(LCP) 


q  -f  Mx  >  0 

X  >  0 

x^{q  -I-  Mx)  =  0, 


(2-16) 


where  q  =  —A^b  and  M  =  A^A.  Because  this  M  is  symmetric  and  positive 
semi-definite,  this  LCP  is  equivalent  to  the  queidratic  program 

min  q^x  -I-  ^x^Mx  =  c  (2-17) 

s.t.  X  >  0. 
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(This  problem  could,  of  course,  be  derived  directly  from  (2-15).) 

Accordingly,  emy  convex  quadratic  programming  algorithm  would  be  appli¬ 
cable  to  (2-17).  Although  we  are  going  to  demonstrate  an  equivalence  between 
the  quadratic  programming  algorithm  developed  by  Dantzig  and  by  Van  de 
Panne  k  Whinston,  the  very  same  algorithnnic  ideas  are  developed  by  Goldf2irb 
k  Idnani  [10]  from  an  active-set  point  of  view.  (The  following  description  of 
this  algorithm  is  adapted  from  the  book  by  Cottle,  Pang  and  Stone  [4].) 

The  Quadratic  Programmiug  Algorithm 

The  algorithm  is  based  on  principal  pivotal  transforms  of  the  system  y  =  q-\-Mx, 
which  can  be  written  in  tableau  form  as 


2c 

y 


where  c  is  as  in  (2-17).  At  any  point  in  the  algorithm,  we  will  have  two  index  sets 
a  and  corresponding  to  basic  and  nonbasic  x  variables,  respectively.  Thus  if 
we  partition  and  relabel  the  original  tableau  as 


1  X 


0 

M 

1 

ya 

2c 

0 

qp 

7a 

yp 

qp 

Mpp 

Xa 

7a 

A/ota 

then  we  can  write  a  general  tableau  during  the  algorithm  in  terms  of  the  original 
tableau  as 


1 

Xp 

ya 

2c 

-7a^aa7a 

qp  -qi^aa  Map 

q'cMaa 

yp 

qp  -  MpaM~^qa 

Mpp  -  MpaM~^Map 

MpcM-^ 

Xc, 

-M-^qo 

-M-^MaP 

From  this  tableau,  we  can  see  that  when  the  nonbasic  variables  xp  and  !/„  are 
set  to  0,  the  basic  variables  x^  and  yp  can  be  expressed  as 

yp  =  qp-MpaM~^qa. 

We  will  rewrite  the  general  tableau  (after  k  principal  pivots)  as 
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1  4  y^a 


e‘‘ 

i4y 

4 

Mf 

in  order  to  ease  the  notation  when  examining  the  algorithm  in  more  detail. 
We  now  describe  the  quadratic  programming  algorithm  as  applied  to  (2-17)  in 
pseudocode. 

0.  Initialization. 

:=(q,M) 

k:=0 
a  :  =  0 

/?  :=  {l,...,n} 

1.  Check  stopping  conditions. 

s  :  =  argmin{9,^  :  »  €  /?} 

If  Is  ^  0>  solution  is  Zo,  =  g*,  xp  =  0. 

If  q^  <  0,  choose  zj  as  the  incoming  variable. 

Note  that  we  do  not  need  to  check  for  infeasibility  here, 
as  (2-17)  always  has  a  solution. 

2.  Determine  outgoing  (blocking)  basic  z  variables,  if  any. 

Let 


7  = 


min 

•  ((";,)0.<o 


where  (M*^),  is  the  sth  column  of  Af*^, 

((M*^),),  is  the  ith  component  of  that  column,  and 
similarly  (g*)i  is  the  rth  component  of  g*. 

3.  Perform  a  pivot. 

If 


7> 


then  there  b  no  outgoing  basic  z  variable.  In  this  case. 

Move  index  s  from  index  set  P  to  a. 

Set  k  =  k  +  1. 

Go  to  Step  1. 

Otherwise,  let  r  be  the  index  corresponding  to  the  minimum  ratio  y 
Move  index  r  from  index  set  a  to  p. 

Set  ib  =  it  -f  1. 

Go  to  Step  2. 
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Further  discussion  of  this  algorithm  can  be  found  in  Cottle,  Pang  fc  Stone  [4]. 

We  now  demonstrate  the  equivalences  between  this  algorithm  and  the  no¬ 
denominator  version  of  the  least-squares  algorithm.  Let  us  assume  that  at  some 
point  in  both  algorithms  we  have  the  same  index  sets  a  and  /?  corresponding 
to  basic  and  nonbasic  x  variables,  respectively.  We  can  rewrite  the  partitioned 
original  tableau  in  terms  of  our  least-squares  algorithm  as 


1 

X0 

Vc, 

2c 

0 

-b^N 

-b‘B 

y0 

-N‘b 

N^N 

~Wb~ 

Xa 

-Bn 

B‘*N 

B’^'B 

Note  that  =  —N'^b,  and  so  on.  Similarly,  we  rewrite  the  general  tableau  as 


^0  Va 


-b‘B{B^B)'^Bn 

-b‘N  +  B{E‘  B)^  B‘  N 

-Nn+  N‘  B{B^B)-^Bn 

N>N  -  N‘B{B'^'B)-^  B‘N 

ffinfiTM 

-(B^B)-'B‘N 

(B'B)-' 

Thus  we  see  that  Xq  =  —M~^qa  =  B^b  =  xg,  and  the  two  solutions 

are  the  same.  Therefore,  given  the  same  index  set  of  basic  x  variables,  both  the 
quadratic  programming  algorithm  and  the  least-squares  algorithm  produce  the 
same  solution  x<,  =  xb- 

To  further  compare  the  two  algorithms,  assume  that  again  both  algorithms 
have  the  same  index  sets  o  and  corresponding  to  the  beisic  and  nonbasic  x 
variables.  We  will  consider  which  nonbasic  x  variable  is  chosen  to  become  basic 
in  both  algorithms. 

In  the  quadratic  programming  algorithm  we  select  incoming  variable  x,, 
where 


s  =  argmin{g,*}, 

=  argmin  { {q^  -  } 

=  argmin  { —Sjb  -t-  nJBxb  } 
i€^ 

=  argmin  {  —NJu}  . 

If  we  find  that  g*  =  —Nju  >  0,  then  in  the  quadratic  programming  algorithm, 
we  know  we  have  a  solution  to  (2-17).  Similarly,  if  NJu  <  0,  the  least-squares 
algorithm  has  either  found  a  solution  to  (2-1)  (in  which  case  u  =  0),  or  has 
detected  infeasibility  of  (2-1).  In  either  case,  the  least-squares  algorithm  has 
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found  a  solution  to  (2-17).  Thus  N]^u  <  0  is  the  termination  condition  for 
both  algorithms.  (Note  that  the  quadratic  programming  algorithm  never  finds 
infeasibility,  as  (2-17)  always  has  a  solution.)  On  the  other  hand,  if  = 
—NJ^u  <  0,  then  x,  is  the  incoming  variable  for  both  problems.  Thus  the  two 
algorithms  have  the  same  termination  conditions  and  the  same  column-selection 
rules. 

Given  that  we  start  an  iteration  with  the  same  index  sets,  and  that  the 
same  incoming  variable  x,  is  selected,  we  will  now  consider  how  the  two  algo¬ 
rithms  proceed.  In  the  least-squares  algorithm,  x,  simply  becomes  basic,  even 
though  this  could  cause  other  basic  variables  to  become  negative.  However,  the 
quadratic  programming  algorithm  checks  for  and  pivots  out  all  (if  any)  variables 
which  would  become  negative,  only  then  making  x,  basic.  If  there  is  no  outgo¬ 
ing  basic  variable,  then  both  algorithms  add  x,  to  the  set  of  basic  variables,  all 
other  basic  variables  remain  nonnegative,  and  the  edgorithms  again  share  the 
same  index  sets. 

Assume  now  that  bringing  x,  into  the  set  of  basic  x  variables  will  cause  at 
least  one  of  the  other  basic  variables  to  become  negative.  In  the  case  of  the  least- 
squares  algorithm,  this  situation  means  that  at  least  one  of  the  previously  basic  x 
variables  is  nonnegative,  and  at  least  one  of  these  negative  basic  variables  must 
be  dropped  from  the  basis.  Similarly,  the  quadratic  programming  algorithm 
must  find  and  eliminate  from  the  basis  at  least  one  outgoing  (blocking)  variable. 
We  will  now  demonstrate  that  given  the  same  index  sets,  and  the  same  incoming 
variable  x,,  if  a  column  must  be  dropped  from  the  basis  in  order  to  maintain 
nonnegativity  of  the  basic  x  variables,  both  algorithms  will  select  the  same  basic 
X  variable  to  be  removed  from  the  basis. 

The  quadratic  programming  algorithm  will  select  Xr  to  become  nonbasic, 
where 


r 


argmin 


argmin 

( 

|<ol 

argmin 


(^B)i 

-(dD)i’ 


where  dp  =  On  the  other  hand,  the  least-squares  algorithm 

will  select  Xr  to  become  nonbasic,  where 


r 


argmax 

y.<o 


-Vi 

(*B)i  -  Vi 


=  argmin 

yi<o 


{xB)i 

-Vi 
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From  Theorem  2.24  we  know  that  the  last  component  of  y  is  strictly  positive, 
and  is  thus  not  considered  in  the  minimum-ratio  test.  After  much  algebraic 
manipulation,  we  can  again  rewrite  the  expression  for  this  minimum-ratio  test 
as 


r  = 


argmin 

(l6-7<iD),<0 


-(XB  -fdoh' 


where 


7  = 


c 

d’ 


c  =  NTv 


d  =  -  nJn,. 


In  order  to  demonstrate  that  both  minimum-ratio  tests  must  select  the  same 
index  r,  we  will  derive  conditions  that  must  be  met  in  order  for  a  different  index 
to  be  chosen.  This  will  produce  a  contradiction. 

Assume  that  for  some  indices  j  and  k,  we  have 

i^B)j  {XB)k  .  (xb)]  (xB)k 

-(do)]  -(do)):  -(^B-7do)j  -(xB-7do)k' 

where 

(dD)j  <0;  (dD)k<0;  (xb  —  idp)}  <  O',  (xB-7dD)k<0. 

This  can  only  be  true  if  we  have  both 


-(do)]  ^  -(dD)k 
{XB)j  "  (XB)k 

and 

-jido)}  ^  -l(dD)i 
{xB)j  "  (XB)k 

which  can  only  be  true  if  7  >  0 

and 

(do);  <0 

and 

{do)j  >  -(XB)j 

(do)*  <  0 

and 

{dD)k  >  -{XB)k- 

We  know  that  that  all  of  these  conditions  cannot  be  satisfied  simultaneously  if 
7  >  0,  as  we  know  xb  >  0.  Therefore,  both  algorithms  select  the  same  index  r, 
thus  choosing  the  same  basic  x  variable  to  become  nonbasic. 

At  this  point,  the  basic  variable  Xr  is  dropped  from  the  basis  in  both  al¬ 
gorithms,  moving  index  r  from  the  basic  to  the  nonbasic  index  set.  In  the 
quadratic  programming  algorithm,  the  next  outgoing  (blocking)  basic  variable 
must  be  determined,  as  x,  has  not  yet  been  brought  into  the  basis.  In  the  least- 
squares  algorithm,  things  are  slightly  different  because  the  incoming  variable  x, 
is  already  basic.  However,  once  Xr  is  dropped  from  the  basis  and  a  new  value  for 


y  is  computed  by  the  least-squares  algorithm,  we  are  back  in  the  same  situation 
as  described  earUer.  This  time  we  have  a  smaller  set  of  basic  variables,  and 
by  the  same  argument  given  above,  both  algorithms  will  again  select  the  same 
basic  X  variable  to  leave  the  basis.  Thus  we  may  conclude  that  given  the  same 
initial  index  sets  a  and  P,  the  same  incoming  column  is  selected,  and  the  same 
series  of  variables  is  dropped  from  the  basis.  Once  all  necessary  basic  columns 
have  been  dropped,  the  quadratic  programming  algorithm  finally  brings  x,  into 
the  basis,  and  both  algorithms  again  share  the  same  sets  of  basic  and  nonbasic 
indices. 

Therefore  we  see  that  the  only  difference  between  the  two  algorithms  is  the 
fact  that  after  x,  has  been  successfully  introduced  into  the  basis  (and  all  drop¬ 
ping  of  basic  X  variables  is  complete),  the  least-squares  algorithm  takes  all  basic 
variables  with  a  zero  value  and  makes  them  nonbasic,  while  the  quadratic  pro¬ 
gramming  algorithm  does  not.  In  this  degenerate  situation,  the  two  algorithms 
can  develop  different  index  sets,  and  can  thus  take  a  different  path  to  find  a 
different  final  solution. 

Notice  that  we  have  also  just  proved  that  this  quadratic  programming  algo¬ 
rithm  is  equivalent  to  the  NNLS  algorithm  in  Lawson  k.  Hanson  [12]. 

3  Computational  Results 

In  this  section,  we  consider  the  actual  performance  of  the  algorithm  developed 
in  Section  2.  All  computational  results  have  been  obtained  using  the  30  smallest 
linear  programming  test  problems  available  from  NETLIB  [7].  These  test  prob¬ 
lems  were  converted  to  the  form  of  (2-1),  with  the  addition  that  free  variables 
were  allowed  (although  nonpositive  vewiables  were  not).  This  produced  a  set 
of  equivalent  problems  with  modified  dimensions;  see  Table  3-1  for  the  original 
and  modified  dimensions.  For  bitmaps  of  the  nonzero  patterns  of  many  of  the 
original  problems,  see  [13]. 

The  test  environment  consisted  of  the  following  hardware  and  software: 

Computer:  HP  9000/835 

32  M  main  memory 
lOOM  virtual  memory 

Operating  System:  HP-UX  7.0 

Language:  HP  FORTRAN,  64-bit  real  numbers 

The  run  times  reported  here  include  only  computation  time,  as  the  amount  of 
main  memory  used  virtually  eliminated  swapping,  and  I/O  time  was  negligible. 

3.1  The  Least-Squares  Phase  I  Algorithm 

This  section  considers  the  computational  results  of  the  least-squares  Phase  I 
algorithm  and  its  variations,  as  developed  in  Section  2.  See  Sections  2.8  and  2.9 
for  pseudo  code  of  the  algorithm,  and  a  discussion  of  the  variations.  We  do  not 
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Original 

Converted 

Problem(Nanie) 

Rows 

Columns 

Rows 

Coluitms 

l(AFIRO) 

28 

32 

28 

51 

2(SC50A) 

51 

48 

51 

78 

3(SC50B) 

51 

48 

51 

78 

4(ADLITTLE) 

57 

97 

57 

138 

5(BLEND) 

75 

83 

75 

114 

6(SHARE2B) 

97 

79 

97 

162 

7(SC105) 

106 

163 

8(STOCFORl) 

118 

111 

118 

165 

9(RECIPE) 

92 

212 

298 

10(SCAGR7) 

130 

185 

ll(BOEING2) 

244 

382 

12(ISRAEL) 

mm 

175 

316 

13(SHARElB) 

■01 

118 

253 

14(VTP.BASE) 

199 

203 

346 

475 

15(SC205) 

206 

203 

206 

317 

16(GROW7) 

421 

581 

17(BEACONFD) 

174 

295 

18(BRANDY) 

221 

303 

19(SCSD1) 

760 

78 

760 

20(E226) 

282 

224 

472 

21(FORPLAN) 

162 

421 

187 

514 

22(BORE3D) 

234 

315 

247 

346 

23(AGG) 

489 

163 

489 

615 

24(CAPR1) 

272 

353 

419 

613 

25(SCORPION) 

389 

358 

389 

466 

26(BANDM) 

306 

472 

472 

27(SCTAP1) 

28(SCFXM1) 

331 

457 

331 

29(STAIR) 

357 

467 

445 

30(SCSD6) 

148 

148 

Table  3- 1 :  Size  of  Test  Problems 


consider  the  “pure”  algorithm  at  all,  as  the  variation  allowing  free  variables  is 
always  more  efficient  when  they  are  present.  Thus  we  will  consider  the  following 
four  least-squares  algorithms: 

PIF:  The  least-squares  Phase  1  algorithm,  in  which  free  variables  have  not 
been  explicitly  converted  to  pairs  of  nonnegative  variables. 

PIDIF:  The  same  as  PIF,  with  the  exception  that  the  denominator  of  the 
column  selection  rule  is  replaced  by  “1”. 

CBPIF:  The  same  as  PIF,  using  a  crash  basis  as  described  in  Section  2.9. 

CBPIDIF:  The  same  ^ls  PI  DIF,  using  a  crash  basis  as  described  in  Sec¬ 
tion  2.9. 
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All  of  these  least-squares  Phase  I  algorithms  were  implemented  using  dense 
matrix  methods.  QR  factorization  was  used  to  solve  the  embedded  least-squares 
subproblems,  and  this  factorization  was  updated  using  Givens  rotations  as 
columns  were  added  and  dropped  from  the  basis.  See  for  example  [9]  for  details 
on  using  the  QR  factorization  to  solve  least-squares  problems,  and  for  details 
on  Givens  rotations  (also  known  as  plane  rotations). 

The  least-squares  algorithms  are  compared  with  LSSOL,  an  established  pack¬ 
age.  LSSOL  is  written  in  Fortran  77  using  dense  matrix  techniques,  and  solves 
a  class  of  linearly  constrained  quadratic  programming  problems.  .See  [8]  for  a 
more  detailed  description  of  LSSOL.  In  order  to  compare  our  Phase  I  algorithm 
to  LSSOL,  we  use  the  “FS”  option.  This  causes  LSSOL  to  find  a  feasible  so¬ 
lution  to  the  problem  submitted,  using  its  implementation  of  Phase  I  of  the 
simplex  method.  LSSOL  (FS  option)  will  be  denoted  by  FS  in  the  following 
result  tables,  and  by  LSSOL:FS  in  the  text. 

When  looking  at  the  least-squares  Phase  I  algorithms,  we  see  that  there 
is  no  clear-cut  definition  for  an  iteration,  because  the  loop  in  which  columns 
are  dropped  from  the  basis  could  be  executed  a  large  number  of  times  between 
column  selections.  We  decided  to  define  an  iteration  to  occur  whenever  a  least- 
squares  problem  is  solved.  This  means  that  adding  a  single  column  to  the 
basis  is  considered  to  be  an  iteration,  as  is  dropping  a  column  by  forming  a 
convex  combination  (see  Step  3  of  the  algorithm).  In  addition,  consider  the 
situation  in  which  some  elements  of  the  solution  are  0,  and  are  consequently 
dropped.  Although  no  least-squares  problem  is  solved  in  this  case,  enough 
work  is  performed  in  matrix  updates,  etc.,  that  this  is  also  considered  to  be  an 
iteration. 

Table  3-2  displays  the  iteration  counts  of  the  four  least-squares  Ph2ise  I  algo¬ 
rithms  6uid  LSSOL:FS.  Note  that  the  iteration  counts  for  some  of  the  problems 
are  the  same  for  both  the  non-crash-basis  and  crash-basis  versions  of  the  least- 
squares  Phase  I  algorithm  (e.g.  STOCFORl).  This  is  due  to  the  fact  that  not 
all  problems  contain  columns  of  the  form  we  look  for  when  building  the  crash 
basis.  Also  notice  that  some  of  the  iteration  counts  are  0.  This  indicates  the  sit¬ 
uation  when  the  initial  crash  basis  provides  a  feasible  solution,  and  no  iterations 
are  necessary. 

We  see  from  Table  3-2  that  LSSOL:FS  takes  a  number  of  iterations  com¬ 
parable  to  that  taken  by  the  least-squares  Phase  1  algorithms,  with  the  total 
lying  between  PIDIF  and  CBPIF.  As  stated  earlier,  the  notion  of  an  iteration 
in  the  least-squares  Phase  I  algorithms  is  somewhat  nonstandard.  For  this  rea¬ 
son,  actual  run  times  are  probably  a  more  accurate  meeisure  of  the  least-squares 
algorithms’  performance  than  their  iteration  counts. 

Table  3-3  displays  the  run  times  of  the  four  least-squares  Phase  I  algorithms 
and  LSSOLiFS,  reported  in  CPU-seconds.  Note  that  a  number  of  runtimes 
are  reported  as  “0”.  This  indicates  that  the  CPU-times  recorded  for  these 
problems  round  down  to  less  than  one  CPU-second.  We  see  from  Table  3-3  that 
the  least-squares  algorithms  run  in  substantially  less  time  than  does  LS.SOLrFS; 
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Problem(Name) 

PlF 

PlDlF 

CBPIF 

CBPIDIF 

FS 

l(AFIRO) 

7 

7 

0 

0 

8 

2(SC50A) 

20 

20 

0 

0 

3 

3(SC50B) 

5 

5 

0 

0 

0 

4(ADLITTLE) 

56 

56 

27 

27 

70 

5(BLEND) 

8 

8 

0 

0 

49 

6{SHARE2B) 

248 

248 

229 

217 

HI 

7(SC105) 

34 

34 

0 

0 

■9 

8(STOCFORl) 

98 

98 

98 

98 

Ha 

9(RECIPE) 

106 

106 

.37 

37 

10(SCAGR7) 

169 

169 

145 

145 

ll(BOEING2) 

206 

204 

163 

161 

181 

r2(ISRAEL) 

171 

171 

8 

8 

338 

13(SHARE1B) 

189 

189 

176 

176 

150 

H(VTP.BASE) 

770 

561 

488 

507 

1059 

15(SC205) 

61 

61 

0 

0 

2 

16(GROW7) 

0 

0 

203 

17(BEACONFD) 

122 

122 

89 

89 

52 

18(BRANDY) 

365 

365 

340 

340 

300 

19(SCSD1) 

10 

8 

8 

16 

20(E226) 

154 

154 

209 

21(FORPLAN) 

378 

378 

347 

347 

438 

22(BORE3D) 

175 

175 

164 

164 

126 

23(AGG) 

543 

539 

134 

136 

149 

24(CAPRI) 

586 

440 

444 

451 

25(SCORPION) 

321 

321 

293 

293 

27 

26(BANDM) 

484 

485 

419 

419 

369 

27(SCTAP1) 

440 

411 

322 

349 

430 

28(SCFXMl) 

350 

343 

300 

293 

271 

29(STAIR) 

452 

452 

464 

464 

173 

30(SCSD6) 

207 

68 

207 

68 

61 

TOTAL 

7064 

6677 

5054 

4944 

5323 

Table  3-2;  Leaist-Squares  Phase  I  Algorithm  Iteration  Counts 


the  LSSOL:FS  total  time  is  almost  3.5  times  longer  than  the  total  of  the  best 
least-squares  Phaise  I  algorithm  (CBPIDIF).  In  fact,  LSSOL:FS  solved  only  one 
problem  (SCAGR7)  faster  than  did  CBPIDIF. 

Looking  at  the  least-squares  algorithms  more  closely,  the  “denominator-of-1” 
versions  (PIDIF  and  CBPIDIF)  are  quite  similar  to  their  “full  denominator” 
counterparts  (PIF  and  CBPIF).  However,  the  crash  beisis  versions  (CBPIF 
Sind  CBPIDIF)  are  noticeably  faster  than  the  “from  scratch”  versions  (PIF 
and  PlDlF). 

So  far,  we  have  discussed  iteration  count  and  run  times.  We  will  now  con¬ 
sider  in  some  sense  the  path  taken  by  the  least-squares  Phase  I  algorithms.  In 
particular,  we  will  consider  the  norm  of  the  residual  (error  vector)  as  a  function 
of  iteration  count.  For  the  least-squares  Phase  I  algorithms,  this  is  ||  u||,  where 
u  is  as  defined  in  the  pseudocode  algorithm  found  in  Section  2.8 
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Problein(Name) 

PIF 

PlDlF 

CBPIF 

CBPlDlF 

FS 

l(AFIRO) 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

3(SC50B) 

0 

0 

0 

0 

1 

4(ADLITTLE) 

A 

2 

1 

1 

5 

5(BLEND) 

0 

1 

0 

0 

6 

6(SHARE2B) 

1? 

12 

11 

11 

15 

7{SC105) 

2 

2 

1 

1 

5 

8(STOCFORl) 

7 

7 

7 

7 

16 

9(RECIPE) 

21 

21 

10 

10 

61 

10{SCAGR7) 

15 

15 

15 

15 

14 

n{BOEING2) 

61 

59 

48 

47 

209 

12(ISRAEL) 

32 

30 

5 

4 

227 

13(SHARE1B) 

18 

17 

17 

18 

51 

14(VTP.BASE) 

416 

311 

271 

277 

2026 

15(SC205) 

11 

11 

2 

2 

34 

16(GROW7) 

209 

11 

11 

801 

17(BEACONFD) 

21 

18 

17 

42 

18(BRANDY) 

78 

80 

75 

208 

19(SCSU1) 

3 

4 

3 

7 

20(E226) 

63 

61 

51 

51 

196 

21(FORPLAN) 

98 

95 

90 

90 

398 

22(BORE3D) 

48 

47 

47 

46 

153 

23(AGG) 

676 

686 

182 

181 

889 

24(CAPRI) 

545 

549 

472 

478 

1438 

182 

182 

172 

173 

277 

26{BANDM) 

244 

242 

224 

223 

571 

27(SCTAP1) 

274 

263 

221 

229 

695 

28{SCFXM1) 

223 

222 

202 

199 

530 

29{STAIR) 

545 

544 

586 

589 

816 

30(SCSD6) 

130 

41 

128 

41 

45 

TOTAL 

3934 

3731 

2876 

2799 

9738 

Table  3-3:  Least-Squares  Phase  I  Algorithm  Run  Times 


Figure  3-1  illustrates  the  typical  behavior  of  ||  u  ||  in  PIF.  We  chose  to  display 
the  behavior  of  only  one  of  the  four  least-squares  Phase  1  algorithms  because 
in  this  aspect,  they  were  virtually  identical.  Notice  that  PIF  finds  a  solution 
“close”  to  feaisibility  very  quickly,  spending  the  v^lst  majority  of  its  iterations 
reducing  || « ||  from  0.1  to  0. 

The  last  comparison  we  will  make  between  the  least-squares  Phase  1  algo¬ 
rithms  and  LSSOL:FS  will  be  based  on  the  accuracy  of  the  final  solution  found. 
When  each  algorithm  found  an  x  that  it  claimed  to  be  a  feasible  solution  (or  as 
close  as  possible  to  such  a  solution),  the  error  quantity  ||  fr  —  Ax  ||  wais  formed 
and  reported.  These  errors  in  the  final  solution  are  displayed  in  T'able  3-4. 

In  most  problems,  there  is  very  little  difference  in  accuracy  between  the  least- 
squares  Phase  I  algorithms  and  LSSOL:FS.  However,  in  the  case  of  FORPLAN, 
the  error  of  the  le^tst-squares  algorithms  is  quite  large  relative  to  the  error  of 


37 


Hull 


CAPRI 


Figure  3-1:  Residual  Behavior  of  PIF  on  CAPRI 


LSSOL:FS.  This  is  not  surprising,  as  the  error  in  the  computed  solution  x  in 
the  least-squaires  algorithms  depends  on  the  square  of  the  condition  number 
of  the  matrix  of  currently  basic  columns  (except  when  B  is  square).  (See, 
for  example  [3].)  This  inaccuracy  can  occasionally  cause  least-squares  Phase  I 
algorithms  to  report  infeasibility  incorrectly. 

It  may  be  possible  to  handle  this  lack  of  accuracy  by  adding  iterative  refine¬ 
ment  to  the  end  of  the  least-squares  Phase  I  algorithms.  That  is,  we  take  the 
(previously  final)  solution  x  and  let  =  x.  Let  B  be  the  matrix  of  currently 
beeic  columns.  We  then  perform  some  (small)  number  of  iterations 

u’  =  b-Bx* 

BSx*  =  «• 
x'+'  =  x‘-l-«x‘. 

In  practice,  we  have  the  QR  factorization  of  fl  as  opposed  to  B,  and  B  is  gen¬ 
erally  rectangular,  so  we  would  actually  perform  iterative  refinement  as  follows: 

(o)‘*‘  = 
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As  long  as  Q  is  stored,  this  would  not  be  too  much  work,  especially  if  only 
a  few  iterations  were  necessary.  Note  that  this  scheme  for  iterative  refinement 
depends  on  Bx  —  b  being  nearly  feasible.  This  may  not  be  the  case  if  we  have 
accumulated  a  significant  amount  of  error.  See  [2]  for  more  details,  as  well  as 
for  the  description  of  a  method  of  iterative  refinement  that  is  not  dependent  on 
the  feasibility  of  Bx  =  6. 

To  summarize,  the  four  least-squares  Phase  I  algorithms  have  very  favorable 
run  times  relative  to  the  simplex  method  as  implemented  by  LSSOL,  with  the 
“crash  basis”  versions  performing  the  best.  They  also  find  a  solution  “close” 
to  feasibility  very  quickly.  However,  due  to  the  least-squares  subproblems  em¬ 
bedded  in  the  four  Phase  I  algorithms,  accuracy  can  be  a  problem  on  larger, 
more  ill-conditioned  problems.  It  is  possible  that  iterative  refinement  of  the  final 
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Problein(Name) 

PIF 

PlDlF 

CBPIF 

CBPlDlF 

LS 

l(AFIRO) 

7 

7 

0 

0 

22 

2(SC50A) 

20 

20 

0 

0 

38 

3(SC50B) 

5 

5 

0 

0 

26 

4(ADLITTLE) 

56 

56 

27 

27 

148 

5(BLEND) 

8 

8 

0 

0 

17 

6(SHARE2B) 

248 

248 

229 

217 

243 

7(SC105) 

34 

34 

0 

0 

88 

8(STOCFORl) 

98 

98 

98 

98 

149 

9(RECIPE) 

106 

106 

37 

37 

204 

10(SCAGR7) 

169 

169 

M5 

145 

251 

ll(BOEING2) 

206 

204 

163 

161 

925 

12(ISRAEL) 

171 

171 

8 

8 

836 

13(SHARE1B) 

189 

189 

176 

176 

409 

14(VTP.BASE) 

770 

561 

488 

507 

2499 

15(SC205) 

61 

61 

0 

0 

218 

16(GROW7) 

9 

620 

17(BEACONFD) 

122 

89 

89 

214 

18(BRANDY) 

3 

365 

340 

340 

598 

19(SCSD1) 

10 

8 

10 

8 

35 

20(E226) 

203 

203 

154 

154 

399 

21(FORPLAN) 

378 

378 

347 

347 

436 

22(BORE3D) 

175 

175 

164 

164 

385 

23(AGG) 

543 

539 

134 

136 

608 

24(CAPRI) 

586 

440 

444 

778 

25(SCORPION) 

321 

321 

293 

293 

320 

26(BANDM) 

484 

485 

419 

419 

874 

27(SCTAP1) 

440 

411 

322 

349 

1210 

28(SCFXM1) 

350 

343 

300 

293 

520 

29(STAIR) 

452 

452 

464 

464 

679 

30(SCSD6) 

207 

68 

207 

68 

178 

TOTAL 

7064 

6677 

5054 

4944 

13927 

Table  3-5;  Phase  I  Algorithm  and  LSSOL;LS  Iteration  Counts 


solution  could  alleviate  such  difficulties,  but  this  variation  was  not  implemented. 

3.2  A  Constrained  Least  Squares  Problem 

So  far  we  have  considered  computation  results  on  problem  (2-1).  However,  as 
we  noted  in  Section  2.10,  our  least-squares  Phase  I  algorithm  actually  solves  (2- 
15),  a  constrained  least-squares  problem.  Thus  we  decided  to  compare  our 
Phase  I  algorithm  with  that  of  LSSOL  when  applied  to  (2-15)  (by  using  the 
“LS”  option).  LSSOL  (LS  option)  will  be  denoted  by  LS  in  the  following  result 
tables,  and  by  LSSOLiLS  in  the  text. 

Table  3-5  displays  the  iteration  counts  of  the  least-squares  Phase  I  algorithms 
considered  in  Section  3.1  and  LSSOL:LS,  and  Table  3-6  displays  the  run  times. 
Again,  the  least-squares  Phase  I  algorithms  do  well,  with  the  best,  CBPIDIF, 
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PlF 

PlDlF 

CBPlF 

CBPIDIF 

LS 

l(AFIRO) 

0 

0 

0 

0 

1 

2(SC50A) 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

4(ADLITTLE) 

2 

2 

1 

1 

5 

5(BLEND) 

0 

1 

0 

0 

2 

6(SHARE2B) 

12 

12 

11 

11 

11 

7(SC105) 

2 

2 

1 

1 

mm 

8(STOCFORl) 

7 

9(RECIPE) 

10 

HI 

10(SCAGR7) 

15 

mm 

ll(BOEING2) 

61 

59 

48 

47 

214 

12(ISRAEL) 

32 

5 

4 

109 

13(SHARE1B) 

18 

17 

17 

18 

31 

14(VTP.BASE) 

416 

311 

271 

277 

812 

15(SC205) 

11 

11 

2 

2 

51 

■  III  1  1 

207 

209 

11 

11 

409 

17{BEACONFD) 

20 

21 

18 

17 

55 

18(BRANDY) 

79 

78 

80 

75 

109 

19(SCSD1) 

3 

3 

4 

3 

47 

20(E226) 

63 

61 

51 

51 

151 

21(FORPLAN) 

98 

95 

90 

90 

174 

22(BORE3D) 

48 

47 

47 

46 

120 

23{AGG) 

676 

686 

182 

181 

608 

24(CAPR1) 

545 

549 

472 

478 

529 

25(SCORPION) 

182 

182 

172 

173 

217 

26(BANDM) 

244 

242 

409 

27(SCTAP1) 

274 

263 

221 

229 

711 

28(SCFXM1) 

223 

222 

202 

199 

400 

29(STAIR) 

545 

544 

586 

589 

573 

130 

41 

128 

41 

342 

TOTAL 

3934 

3731 

2876 

2799 

6177 

Table  3-6:  Phase  1  Algorithm  and  LSSOL:LS  Run  Times 

running  in  45%  of  the  total  time  taken  by  LSSOL:LS.  Apparently,  our  least- 
squares  Phase  I  algorithms  are  abo  efficient  in  solving  this  particular  kind  of 
constreiined  least-squares  problem. 

4  Summary  and  Conclusions 

Section  2  developed  a  Phase  1  algorithm  using  least-squares  subproblems.  Sub¬ 
problems  are  solved  to  provide  both  an  approximation  to  the  right-hand  side 
that  is  as  close  as  possible  in  the  2-norm  (given  the  current  basic  columns), 
and  to  select  the  next  incoming  column.  The  incoming  column-selection  rule  is 
myopic,  selecting  that  column  whose  nonnegative  combination  with  the  current 
approximation  brings  us  closest  to  the  right-hand  side.  This  Phase  I  algorithm 
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has  the  property  of  strict  improvement  at  each  iteration,  even  in  the  presence 
of  degeneracy  (in  the  simplex  method  sense).  Finally,  equivalences  between  the 
least-squares  Phase  1  algorithm  and  other  algorithms  were  discussed. 

Four  variations  of  this  least-squares  Phase  I  algorithm  were  developed,  and 
the  computational  results  were  excellent.  The  best  least-squares  Phase  1  algo¬ 
rithm  ran  almost  3.5  times  faster  than  LSSOL’s  implementation  of  Phase  I  of 
the  simplex  method.  In  addition,  the  same  algorithm  ran  2.2  times  faster  than 
LSSOL’s  constrained  LS  solver,  when  applied  to  problems  of  the  form  (2-15). 

Future  Work 

We  have  demonstrated  the  initial  proof-of-concept  of  the  Phase  1  algorithm 
developed  in  Section  2.  In  order  for  this  algorithm  to  be  commercially  com¬ 
petitive,  it  must  be  fine-tuned,  just  as  the  simplex  method  has  been  since  its 
discovery.  In  particular,  the  problem  of  inaccuracy  on  unstable  problems  must 
be  addressed,  perhaps  with  the  inclusion  of  iterative  refinement  of  the  solu¬ 
tions  to  the  embedded  least-squares  subproblems.  In  addition,  these  algorithms 
should  be  implemented  using  sparse  matrix  methods  to  see  how  they  compare 
to  sparse  matrix  implementations  of  the  simplex  method. 
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12b.  distrirution  code 


Although  the  simplex  method's  performance  in  solving  linear  programming  problems  is  usually  quite  good,  it 
does  not  guarantee  strict  improvement  at  each  iteration  on  degenerate  problems.  Instead  of  trying  to  recognize 
and  avoid  degenerate  steps  in  the  simplex  method  (as  some  variants  do),  we  have  developed  a  new  Phase  I 
algorithm  that  is  completely  impervious  to  degeneracy,  with  a  strict  improvement  attained  at  each  iteration.  It  is 
also  noted  that  the  new  Phase  I  Algorithm  is  closely  related  to  a  number  of  existing  algorithms. 

When  tested  on  the  30  smallest  NETLIB  linear  prograggming  test  problems,  the  computational  results  for 
the  new  Phase  I  algorithm  were  almost  3.5  times  faster  than  the  simplex  method;  on  some  problems,  it  was  over 
10  times  faster. 
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